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Abstract 

In  this  paper  we  study  inference  for  a  conditional  model  with  a  jump  in 
the  conditional  density,  where  the  location  Eind  size  of  the  jump  are  described 
by  regression  lines.  This  interesting  structure  is  shared  by  several  structural 
econometric  models.  Two  prominent  examples  are  the  standard  auction  model 
where  density  jumps  from  zero  to  a  positive  value,  and  the  equilibrium  job 
search  model,  where  the  density  jumps  from  one  level  to  another,  inducing 
kinks  in  the  cumulative  distribution  function.  This  paper  develops  the  asymp- 
totic inference  theory  for  likelihood  based  estimators  of  these  models-  the  Bayes 
and  maximum  likelihood  estimators.  Bayes  and  ML  estimators  are  useful  clas- 
sical procedures.  While  MLE  is  transformation  invariant,  Bayes  estimators 
offer  some  theoretic  and  computationcJ  advantages.  They  also  have  desirable 
efficiency  properties.  We  characterize  the  limit  likelihood  as  a  function  of  a 
Poisson  process  that  tracks  the  near-to-jump  events  and  depends  on  regres- 
sors.  The  approach  is  applied  to  an  empirical  model  of  a  highway  procurement 
auction.  We  estimated  a  pareto  model  of  Paarsch  (1992)  and  an  alternative 
flexible  parametric  model. 

Key  Words:^Structural  Econometric  Model,  Auctions,  Job  Search,  Highway 
Procurement  Auction,  Likelihood,  Point  Process,  Stochastic  Equisemicontinuity 
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1       Introduction 

In  this  paper  we  consider  a  conditional  model  with  a  jump  in  the  conditional  density, 
whose  location  and  size  are  described  by  regression  lines.  This  model  was  first  proposed 
by  Aigner,  Amemiya,  and  Poirier  (1976)  in  the  context  of  production  analysis.  Many 
recent  econometric  models  also  share  this  interesting  structure.  For  example,  in  standard 
auction  models,  cf.  Donald  and  Paarsch  (1993a),  the  conditional  density  jumps  from 
zero  to  a  positive  value:  in  equilibrium  job  search  models  (Bowlus,  Neumann,  and  Kiefer 
(2001)),  the  density  jumps  from  one  level  to  another,  inducing  kinks  in  the  distribution 
function.  In  what  follows,  the  former  model  is  referred  to  as  the  one-sided  or  boundary 
model,  while  the  latter  model  is  the  two-sided.  It  is  typical  in  these  models  that  the  location 
of  jump  is  indispensably  related  to  the  parameters  of  the  underlying  structural  economic 
model.  Learning  the  parameters  of  location  is  thus  crucial  for  learning  the  parameters  of 
the  underlying  economic  model. 

Several  important,  fundamental  papers  developed  inference  methods  for  such  models, 
including  Aigner,  Amemiya,  and  Poirier  (1976),  Ibragimov  and  Has'minskii  (1981),  Flinn 
and  Heckman  (1982),  Christensen  and  Kiefer  (1991),  Donald  and  Paarsch  (1996),  Donald 
and  Paarsch  (1993b),  Donald  and  Paarsch  (1993a),  Bowlus,  Neumann,  and  Kiefer  (2001). 
Ibragimov  and  Has'minskii  (1981)  (IH  afterwards)  obtained  the  limit  distributions  of  Bayes 
and  maximum  likelihood  estimators(MLE)  without  covariates.  Donald  and  Paarsch  (1996) 
dealt  with  MLE  in  the  one-sided  (boundary)  models  with  discrete  covariates. 

Nevertheless,  the  general  inference  problem  posed  by  Aigner,  Amemiya,  and  Poirier 
(1976)  hcis  remained  unresolved.  The  basic  asymptotic  properties  of  Bayes  and  ML  es- 
timators in  the  general  two-sided  regression  model  are  still  unknown.  The  properties 
of  Bayes  estimators  in  the  one-sided  model  and  the  properties  of  MLE  in  the  one-sided 
model  with  general  regressors  are  also  open  questions.  Without  understanding  these  basic 
properties,  using  classical  estimation  principles  in  these  econometric  applications  may  be 
questionable. 

In  this  paper,  we  develop  the  cisymptotic  theory  of  Bayes  and  Maximum  Likelihood 
estimators  for  a  general  conditional  model  of  a  density  jump,  including  one-sided  and 
two-sided  models  with  arbitrary  covariates.  Bayes  estimators  and  MLE  are  attractive 
estimation  procedures.  While  MLE  is  transformation  invariant,  the  Bayes  estimators  offer 
some  theoretic  and  computational  advantages,  and  are  convenient  in  practice.  They  cJso 
have  desirable  efficiency  properties. 

Further  details  may  be  summarized  as  follows.  We  will  show  that  the  limit  of  the  likeli- 
hood process  is  a  stochastic  integral  of  a  Poisson  point  process  that  tracks  the  conditional 
near-to-jump  events.  The  result  is  analogous  in  spirit  to  that  of  Chemozhukov  (2000), 
obtained  for  the  extremal  quantile  regression. 

It  will  be  shown  that  Bayes  estimators  behave  asymptotically  as  functions  of  the  likeli- 
hood limit.  Unlike  the  usual  case  of  regular  parametric  models,  Bayes  estimators  are  not 
asymptotically  equivalent  to  ML.  In  fact,  Bayes  estimators  axe  efficient  in  terms  of  finite- 
sample  average  risk  optimality  and  asymptotic  average-risk  optimality,  which  strongly 
justifies  their  use.^  We  do  not  study  the  minimax  criteria.  Recent  contribution  by  Hirano 

'MLE  is  not  optimal  for  loss  functions  in  conventional  sense,  but  it  may  be  shown  to  be  optimal  for  a 


and  Porter  (2001)  offer  a  substantive  analysis  in  this  interesting  direction  in  the  context 
of  one-sided  discrete  covariate  models. 

We  will  also  demonstrate  that  the  MLE  behaves  asymptotically  as  a  function  of  the 
likelihood  limit.  Our  proof  uses  the  concept  of  stochastic  equi-semicontinuity  of  Knight 
(1999).  In  our  opinion,  the  result  makes  a  convincing  case  for  its  further  applications  in 
econometrics  and  statistics. 

Finally,  we  will  study  these  methods  in  simulations  and  apply  them  to  estimate  models 
of  a  highway  procurement  auction.  The  first  model  we  estimate  is  a  stylized  pareto  model  of 
Paarsch  (1992).  The  second  one  is  a  flexible  parametric  alternative  to  the  non-parametric 
model  of  Guerre,  Perrigne.  and  Vuong  (2000).  We  also  implemented  computer  programs 
with  Monte  Carlo  Markov  Chain  methods  for  the  estimators.  These  programs  are  available 
from  the  authors. 

The  paper  is  organized  as  follows.  Section  2  describes  a  basic  linear  model.  Section 
3  develops  the  asymptotic  theory  for  this  model.  Section  4  considers  a  more  general 
nonlinear  model  with  nuisance  parameters.  Section  5  discusses  efficiency  issues.  Section 
6  discusses  practical  aspects  of  inference  and  estimation.  Throughout  the  paper,  c  and 
C  denote  generic  positive  constants;  — >  and  — >  denote  convergence  in  probability  and 
distribution,  respectively;  and  |  •  |  denotes  the  supremum  norm  of  a  vector. 

2     The  Basic  Model 

This  section  begins  with  a  basic  linear  model,  which  helps  establish  the  main  results  clearly. 
Extensions  to  general  nonlinear  models  are  given  in  section  4. 

2.1     Assumptions 

The  bcisic  model,  denoted  R,  takes  the  following  form 

Y,=XlP  +  ei,  (1) 

where  the  error  Cj  has  the  conditional  density  /  {e\Xi,P),  parameterized  by  P  belonging 
to  the  set  B,  a  compact,  convex  subset  of  R''.  We  denote  the  reference  parameter  as  po^ 
and  assume  Po  S  interior^.  The  conditional  density  has  a  jump  at  zero: 

\\mf{e\x,p)  =  q{x,p), 

ctO 

\imf{e\x,P)=p{x,p),  (2) 

e-l-O 

p{x,p)>q{x,p)+6,s>o,    VxeX,v/3eB. 

In  other  words,  the  conditional  density  of  Y  given  X  jumps  at  the  location  X'P,  which 
depends  on  the  parameter  p  and  covariate  X .  The  shape  of  the  density  may  also  depend 
on  the  parameter  p.  In  section  4,  it  will  be  made  dependent  on  other  parameters  as  well, 
and  X'P  will  be  generalized  to  a  nonlinear  function. 

generalized  Dirac  loss  function. 


We  have  two  models  to  consider:  the  one-sided  model  and  the  two-sided  model.  The 
one-sided  model  has  its  conditional  density  jumping  from  zero  to  a  positive  constant.  The 
two-sided  model  has  its  conditional  density  jumping  from  one  positive  value  to  another 
positive  value.  Figure  1  illustrates  the  two  models.  The  one-sided  model  is  a  special  case 
of  the  two-sided  model.  In  addition,  as  suggested  by  Aigner,  Amemiya,  and  Poirier  (1976), 
the  two-sided  model  may  be  applied  to  one-sided  models,  using  the  lower  density  region 
to  account  for  outliers.  More  generally,  the  two-sided  model  approximates  models  with  a 
sharp  increase  in  the  density,  whose  location  depends  on  parameters  and  regressors.  The 
finite  sample  distribution  of  parameter  estimates  in  such  a  model  are  approximated  by 
that  in  the  model  with  density  jump. 

It  is  typical  in  these  models  that  the  location  of  jump  is  indispensably  related  to  the 
parameters  of  the  underlying  structural  economic  model.  Learning  the  parameters  of 
location  is  thus  crucial  for  learning  the  parameters  of  the  underlying  economic  model. 

We  maintain  the  following  additional  cissumptions  for  model  R. 

Assumption  1     The  following  statements  apply  to  x  in  X  and  P  in  B: 

(C.l)  {Yi,  Xi)  is  an  i.i.d.  sequence  of  vectors  in  K  x  M'',  defined  on  (fi,  T^  Pp)-  Xt  has  c.d.f 
Fx,  with  compact  support  X,  that  does  not  depend  on  /?,  Var{X)  >  0. 

(C.2)  in  addition  to  (l)-(2),  uniformly  in  P  and  x 
i.    g(x,/3)  >  c  >  0  or 
ii.  /  (u|i,  P)  =  q  (x,  P)  =  0,  for  u  <  0. 

(C.3)  Except  at  e  =  0,  f{e\x,  /3)  has  continuous  derivatives  in  e  and  P,  that  are  bounded 
uniformly  in  u,x.p.  W.l.o.g.  /(e|i,/?)  is  upper-semicontinuous  at  0;  its  derivative  is 
dominated:  sup^^g  Ex  J\-fyf  [y  -  X'P\X;P)  \dy  <  oo. 

(C.4)  There  exist  iiT  >  0,  C  >  0,  (5  >  0,  such  that  uniformly  in  x  and  P:  in  case  C.2.i,  for 
all  c,  e  €  K,  |c|  <  K, 

\fJnf{e  +  c\x,p)\<C{,^x)=C\i,\nf{e\x,p)\'+'; 

in  case  C.2.ii  this  apply  only  to  c,e  :  e  +  c  >  0.  Moreover,  sup^  Ep^C{et,Xt)  <  oo. 

Assumption  C.2  allows  for  the  boundary  case,  where  density  is  zero  to  the  left  side  of  the 
jump  and  is  positive  on  the  right  side.  It  also  allows  for  the  two-sided  case,  where  density 
is  positive  on  both  sides.  We  distinguish  these  two  cases  to  organize  the  proofs  better. 
C.3  and  C.4  are  needed  for  uniform  convergence  of  a  continuous  part  of  the  likelihood 
ratio.  It  will  be  satisfied  as  long  as  the  derivative  of  the  density  is  not  ill-behaved  in  the 
tails.  Finally,  the  linearity  of  the  regression  function  eases  the  exposition.  Section  4  will 
consider  a  more  general  non-linear  model  with  nuisance  parameters. 

2.2     Definition  and  Motivation  of  Bayes  and  ML  Estimators 

The  likelihood  function  for  the  model  is  given  by 

LniP)  =  n  /(^'  -  X',p\Xi;p)dFx{Xi). 

i<n 


A.  density  of  a  two-sided  model 


B.  density  of  a  one-sided  model 


J 

\ 
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A.  cdf  of  a  two-sided  model 


B.  cdf  of  a  one-sided  model 


Figure  1:  Panels  B.  correspond  to  the  one-sided  model.  Panel  A.  corresponds  to  the  two-sided 
model,  which  arises  in  equilibrium  job  search  models  and  translates  into  kinks  of  cumulative 
distribution  functions,  see  Bowlus  et  al  (2001).  The  two-sided  model  also  arises  from  one-sided 
models,  when  with  a  low  probability  we  draw  an  outlier  which  ends  up  below  the  boundary  of 
the  support.  This  model  was  initially  proposed  by  Aigner  et  al  (1976).  See  also  Hajari  (1999) 
for  a  robustness  critique  of  the  one-sided  model  in  the  auction  context.  Note  that  the  location 
of  the  density  jump  depends  on  parameters  and  regressors.  Additional  shape  parameters  will  be 
introduced  in  section  4. 


and  the  ML  estimator'^  is  defined  as 

/?ML  =argmin   -  LniP). 

On  the  other  hand,  the  Bayes  estimator  minimizes  the  posterior  expected  loss 

P  =  argmf  /  p„(6  -  p)  dp, 

beB    Jb  JB^n{P)q{P) 

whhere  p„  (x)  =  p{nx)  is  a  loss  function,  and  q(-)  is  the  prior  density  or  weight  function  on 
B.  In  the  above  expression,  Ln{P)q{p)/J,^  Ln(P)q{p)dP  is  the  posterior  density  conditional 
on  the  data  (Yt,Xt,t  <  n).  It  does  not  depend  on  dFx{Xi).  We  impose  the  standard 
assumptions  on  p(-)  and  q{-),  cf.  IH(1982). 

Assumption  2 

(D.l)  q(-)  >  0  is  continous  on  B, 

(D.2)  p(-)  >  0  is  convex,  and  is  majorized  by  a  polynomial  of  \u\  as  |u|  — ¥  oo. 

Examples  of  conventional  loss  functions,  satisfying  condition  (D.2),  include  the  quadratic 
loss  p  (z)  =  z'Wz  for  positive  definite  W  and  the  absolute  deviation  loss  p  (z)  =  A'abs(z), 
where  A  >  0  and  abs(2)  =  {\z\}. 

For  symmetric  and  bowl-shaped  loss  functions,  the  ML  and  Bayes  estimators  are  ef- 
ficient and  asymptotically  equivalent  under  asymptotic  normality.  If  normality  does  not 
apply,  as  in  our  case,  the  Bayes  estimators  and  MLE  are  typically  not  asymptotically 
equivalent.  Therefore,  MLE  does  not  inherit  the  efficiency  of  Bayes  estimators;  Bayes 
estimators  are  average-risk  efficient  under  conventional  loss  functions,  while  MLE  is  not.^ 

3     Asymptotic  Theory  for  The  Basic  Model 

We  begin  with  the  asymptotic  behavior  of  the  likelihood  process,  and  then  proceed  to 
asymptotic  distributions  of  Bayes  and  ML  estimators. 

3.1     Likelihood  Limit 

In  modern  eisymptotic  analysis,  a  common  first  step  is  to  find  the  finite-dimensional  or 
marginal  limit  of  the  likelihood  ratio  process.  The  limit  eventually  serves  to  describe  the 
asymptotic  distribution  of  Bayes  and  ML  estimators.    Such  an  initial  step  is  called  the 
convergence  of  experiments,  see  van  der  Vaart  and  Wellner  (1996). 
Consider  the  local  likehhood  ratio  function 

£„(z)  =  LniPn  +  z/n)/Ln{Pn), 


■^dFxiXi)  does  not  depend  on  P  and  is  hence  irrelevant. 

^See  for  example  van  der  Vaart  (1999).   Efficiency  for  MLE  could  be  claimed  for  the  "delta  loss"  but 
not  for  quadratic  or  absolute  deviation  loss. 


where  /?„  =  /?o  +  ^In.,  (5  6  R'^  denotes  the  true  parameter  sequence.  This  is  needed  to 
study  asymptotic  efficiency  later.  Finite-dimensional  (fi-di)  weak  convergence  means  that 
for  any  finite  J 


[tn[z,)..      j   <   J)    ^    {eo.{Zj)..      3<J): 


(3) 


and  ^cxi()  is  called  a  fi-di  limit.  In  this  section,  — >  denotes  convergence  under  P/3„  .  Define 
pW  =p(X/?o)  and  g(X)  =9(X^o)- 

Theorem  1    Given  assumption  t,  the  fi-di  weak  limit  of  likelihood  ratio  in{z)  equals 

e^{z)  =exp{z'EX\p{X)  -  q{X)]} 


X  exp 


/  l^{j,x)dN{j,x) 
Je 


where 


Z,(i,  x)=\n^l[0<j<  x'z]  -Hn  4^1  [0  >  i  >  x' z] , 
p(x)  q[x) 

N  is  a  Poisson  random  measure  N(-)  =  ^^i  1  \[Ji,  -^i)  €•]-!-  2i=i  1  K-^i'  '^D  ^  ■]>  'where 
{Xi}  are  i.i.d.   with  d.f.  Fx ,  and  {^/}  is  an  independent  copy  of{Xi}, 


Jl  =  n/q{X!), 


T[  =  -{£[  +  ...+£l), 


(4) 


{£i}  and  {£",'}  are  two  i.i.d.,  mutually  independent  sequences  of  standard  exponential  ran- 
dom variables  that  are  also  independent  of  {Xi}  and  {X-}. 

The  result  heis  an  intuitive  appeal.    The  limit  likelihood  ^00(2)  hais  two  informative 
parts.  The  deterministic  "outer"  part, 

£i(z)=exp{z'EX\piX)-q{X)]}, 

can  be  regarded  as  information  created  by  the  far-from-jump  data.  The  "inner"  stochastic 
part, 


£2(2)  =  exp 


J   l,{j,x)dJ>i{j,x) 


=  exp 


J2UJi,Xi)  +  J2iM.xl) 


.t=i 


can  be  interpreted  cis  information  created  by  near-to-jump  data. 

N  is  an  asymptotic  model  of  the  near-to-jump  data.  As  equation  (4)  shows,  the  points 
of  N,  {Ji,Xi)  and  {Ji,XI),  depend  on  regressors  Xi  in  a  complex  way.  N  is  the  limit  of 
the  point  process 

N(-)=   J2  i[{nei,Xi)e-]+   J2  l[ne„X,)£-]. 

i:ti>0  i:ei<0 

The  mecisure  N(vl)  counts  the  number  of  points  in  ciny  given  set  A.  For  any  bounded  set  A, 
the  limit  behavior  of  N(A)  depends  only  on  near-to-jump  errors  nci  and  the  corresponding 


6 


covariate  values.  The  smallest  |£i|'s  are  the  ones  that  matter  and  they  converge  in  law 
to  mutually  dependent  gamma  variables.  Furthermore,  in  large  samples  the  likelihood  is 
driven  mainly  by  near-to-jump  data,  revealing  /3  at  Op{n~^)  rate.  The  fast  convergence 
rate  is  not  surprising.  In  a  simplest  one-sided  case  with  no  covariates,  MLE  is  the  minimal 
order  statistic  that  converges  to  the  end-point  of  the  support  at  Op{n~^)  rate. 

Note  also  an  important  simplification  of  the  formulae  in  the  one-sided  case.    Since 
g{x)  =  0 

£,{z)^exp{z'EXp{X)}, 

1  ifJ,  >A'/z,Vi 


'0  otherwise 

The  inner  part  £2  is  very  informative  in  assigning  the  zero  likelihood  to  certain  values  of 
z.  Otherwise,  £2{z)  is  flat.  Once  £2(z)  equals  1,  the  outer  part  £i{z)  further  shapes  the 
likelihood.  In  the  two-sided  model,  when  q{x)  >  0,  no  z  is  assigned  a  zero  inner  likelihood. 
Both  the  £1  and  £2  shape  the  limit  likelihood. 

3.2     Large  Sample  Properties  of  Bayes  Estimators 

The  normalized  Bayes  estimator  Z„  =  n{/3  —  P„)  is  related  to  the  likelihood  ratio  process 
by  minimizing  the  posterior  loss: 

^niz)=  /     p{z  -u)-rrn(u)du. 
where  7r„  [u)  is  the  posterior  density  on  the  rescaled  parameter  space  Un  =  n{B  —  Pn)- 

■Kn{u)  =  £n  (w)  q  (/?n  -|-  u/n)  /in  («)  9  [Pn  +  u/u)  du. 

p  (z)  is  the  loss  function.  £„  (z)  is  the  likelihood  ratio  process  defined  in  the  previous 
section  and  q  (/3)  is  the  prior  density  function. 

As  n  — >  oo,  Ujt  approaches  E"*  and  the  posterior  7r„(z)  approaches  the  limit  7roo(u)  = 
^oo  (w)  /  Jjijj^oo  (w)  du.  The  limit  posterior  ttoo  is  a  function  of  the  likelihood  limit  only;  it 
is  free  from  prior  information.  The  result  is  thus  simple  to  conjecture. 

Theorem  2  Suppose  assumptions  1  and  2,  also  define  for  £ao  (•)  specified  in  Theorem  1: 
T^{z)^j    p{z-u)        ^^^f       du. 

Suppose  that  Z^o  =  argmin^gudFoo  {z)  is  uniquely  defined  in  W^  a.s.  (*),  then 

7    -^  7 

Remark  3.1  The  condition  (*)  is  automatic  for  strictly  convex  functions  p(z)  with  unique 
minimum  at  z  =  0,  since  iooiz^  is  positive  a.s.  on  a  subset  of  M''  with  positive  Lebesgue 
measure  by  the  assumed  non-degeneracy  of  X. 


3.3     Maximum  Likelihood 

MLE  Zn  —  n^ML  ~  Pn)  maximizes  the  local  likelihood  ratio  process:* 

Zn  =  aigmm.^u^  -  £„  (z) 

Because  ^n(-)  is  a  highly  non-regular  function,  the  standard  uniform  convergence  argu- 
ments are  not  applicable.  One  approach,  taken  by  IH{1982),  treats  £„{■)  as  an  element 
of  a  Skorohod  space.  There  are  substantive  difficulties  with  this  approach  in  the  regres- 
sion case  where  there  is  more  than  one  parameter.  Instead,  we  employ  Knight's  stochastic 
equisemicontinuity,  which  converts  the  finite-dimensional  convergence  of  discontinuous  ob- 
jective functions  into  convergence  of  argmins.^  Appendix  A  provides  a  brief  discussion  of 
this  new  concept. 

Theorem  3  Suppose  assumption  1,  and  that  — ^ooC-^)  attains  a  unique  minimum  in  W^ 
a.s.,  then 

Z„  — >  Zoo  =  argmin^gKrf  -  (.oo{z)- 

Remark  3.2  The  condition  that  —£oo{z)  attains  a  unique  minimum  a.s.  is  needed,  oth- 
erwise the  limit  distribution  may  fail  to  exist. 

The  important  special  case  of  MLE  with  discrete  covariates  in  the  one-sided  model  has 
been  studied  in  the  remarkable  pioneering  work  of  Donald  and  Paarsch  (1996)  and  Donald 
and  Paarsch  (1993a).  The  results  obtained  here  extend  to  continuous  covariates  and, 
importantly,  also  two-sided  Ccises. 

4     Nonlinear  Model  with  Nuisance  Parameters 

In  this  section  we  consider  a  more  general  nonlinear  model  and  also  introduce  nuisance 
parameters.  While  the  linear  model  conveys  the  basic  flavor  and  allows  for  a  better  expla- 
nation of  the  proofs,  the  nonlinear  setup  conforms  with  the  economic  models  described  in 
the  introduction.  The  generalized  model,  denoted  R,  is  given  by 

Y,=g{XuP)  +  U, 

where  the  error  £j  has  conditional  density  /  (e|Xi,  j5, a),  parameterized  hy  P  &  B  C  R'^^ 
and  a  £  A  C  R''^.  We  assume  that  the  set  Q  =  B  x  A  is  compact  and  convex  and  that 
the  reference  parameter  70  =  (/3o,  Qo)  belongs  to  the  interior  of  this  set. 

^If  Zn  is  set- valued,  we  may  choose  any  measurable  solution.  Alternatively,  define  Zn  as  any  measurable 
en-approximate  argmin:Zn  is  s.t.  —  t„{Z„)  <  inf^g^i  — fn(z)  -t-  en,fn  \  0.  Allowing  approximate 
solutions  is  useful  for  situations  in  which  it  may  be  difficult  to  find  the  exact  optimum. 

^MLE  is  a  special  Bayes  estimator  minimizing  the  posterior  loss  T„{z)  =  f  Sz{u)(n{u)  du  =  tni^), 
where  Sz{-)  is  the  delta  function,  which  is  too  irregular  to  be  a  subject  of  the  previous  section. 


The  size  of  the  jump  of  the  conditional  density  of  e,  at  0,  given  X^.  may  depend  on 
both  /?  and  q: 

hm  /  (e|x,  P.a)  =  q  [x,  p.  a)  . 

\imf{€\x.,P,a)=p{x,P,a),  (5) 

£4.0 

pixj,a)>  q{x,0,a)+S.,S>O:,    Vi  e  X,      iP,a)  e  W' ,d  =  d^  +  d-z- 

In  other  words,  the  conditional  density  of  Y  given  X  jumps  at  the  location  g{X,0), 
which  depends  nonlinearly  on  the  parameter  P  and  covariate  X.  The  shape  of  the  density 
depends  on  the  parameters  P,  a  and  covariates  X .  The  additional  shape  parameter  q  is 
not  related  to  the  parameter  of  the  location  function.  This  model  is  therefore  considerably 
more  flexible  than  the  basic  linear  model. 

The  assumptions  and  results  for  the  non-linear  model  are  very  similcir  to  the  linear 
model.  However,  the  presence  of  nuisance  parameters  adds  to  the  complexity  of  exposition. 
We  make  the  following  additional  technical  assumptions: 

Assumption  3    The  following  statements  apply  to  x  in  X  and  7  =  (/?,  a)  in  Q: 

(E.l)   {Yi,Xi)  is  an  i.i.d.  sequence  of  vectors  in  Ex  R'',  defined  on  {Q.,T,Py).  Xi  has  c.d.f 
Fx,  with  compact  support  X.  (5)  holds,  and  uniformly  in  P,a,  and  x 
i.    q(x,P,a)  >  0  0  or  ii.     /  (£|x,  P)  =  q  [x,  P,  a)  =  0,  for  e  <  0. 

(E.2)  Density  /(e|x,  7)  hcis  continuous  derivatives  in  e,  7  for  each  e,  x  and  7,  except  at  e  —  0, 
and  is  bounded  uniformly  in  e,x,7;  has  continuous  and  bounded  second  derivative 
in  a,  uniformly  in  e,x,7.  W.l.o.g.  /(e|x,7)  is  upper-semicontinuous  at  e  =  0  for 
eachx  and  7;  and  sup^g^  Ex  f  \^  {y  -  g{X,P)\X;'y)\dy  <  c». 

(E.3)  g  (x,P)  has  two  continuous  and  bounded  derivatives  w.r.t.  p,  uniformly  in  x  and  0. 
Var        QR         is  positive  definite  uniformly  in  p. 

(E.4)  Let  /i(7')  =  ln/(y;  -  g[X^,t); X, .-)').  For  7'  =  (<,s)  in  an  open  ball  at  7  =  {P,a), 

either  (a)  Ep_^    F^^j  (7')    ^'' (t')     '^  uniformly  nonsingulcir  and  bounded,  or  (b) 

If  (2/-5(x,i);x,7)  =  0  and  Ep^  [^'>  (l')]  [^'i(7')]'  is  uniformly  nonsingular 
and  bounded;  (a)  ajid  (b)  hold  uniformly  in  7. 

(E.5)  There  exist  iiT  >  0,  C  >  0, 5  >  0,  such  that  for  any  x  and  7:  in  Ccise  E.l.i,  for  all 
c, £  €  K,  |c|  <  K,  \-§2  ln/(e  +  c|x,7)|  <  C(e,x)  and  in  case  E.l.ii  this  only  needs  to 
hold  for  all  c,€  :  e  +  c  >  0.  Moreover,  sup^  Ep^C{ei,Xi)  <  00. 

(E.6)  Under  the  same  conditions,  second  derivatives  Eire  dominated:  |  gfl^.  In  /  (e  +  c|i,  7)  | 
<  C"  (£,x)  and  sup^  Ep^C"{ei,X,)  <  00 

The  parameters  include  location  parameters  P  and  shape  parameters  a.  If  P  is  known, 
the  inference  about  a  is  regular.  Thus,  the  assumptions  E.1-E.6  reflect  a  mixture  of  non- 
regular  assumptions  like  in  section  3  and  in  IH(1982)  and  regular  ones  like  in  van  der 
Vaart  and  Wellner  (1996),  chapter  7  (mean-square  differentiability).   Conditions  E.2-E.3 


impose  reasonable  smoothness  on  the  location  function  and  the  density  function.  E.4 
imposes  a  standard  mean-square  differentiability  and  finite  information  matrix  for  the 
shape  parameter  a.  E.5  and  E.6  impose  a  standard  domination  on  the  score  function  and 
its  derivative. 

We  next  derive  the  limit  likelihood  process,  followed  by  the  asymptotics  of  Bayes  and 
ML  estimators. 

4.1      Limits  of  Likelihood 

The  likelihood  for  this  model  is  of  the  form: 

Lnh)  =  n  fiy^  -  g{X,,p)\Xi;'y)dFx(Xi), 

i<n 

where  dFx(Xi)  does  not  depend  on  7  and  factors  out.  ML  and  Bayes  estimators,  7ml  and 
iBayes,  are  defined  as  in  section  2.  The  convergence  rates  needed  to  obtain  a  nondegenerate 
limiting  likelihood  ratio  process  are  given  by  n  and  ^/n,  for  fi  and  a  respectively.  Define 
Hn  as  a  diagonal  matrix  with  l/n  in  the  first  dim(/?)  diagonal  entries  and  l/^/n  in  the 
remaining  diagonal  entries.  Let  7n{z)  =  7n  +  HnZ  for  z  =  (u,  v)  ^  K^ .  The  local  likelihood 
ratio  process  is  given  by  £„  {z)  =  L„(7„(2))/L„(7„). 

In  this  section,  — >  denotes  convergence  in  distribution  under  P-y^ ,  where  7„  =  (/3ji,  Qn)  = 
7o  +  HjiS  for  given  6  6 


jd 


Theorem  4  In  model  R,  given  assumption  3,   the  finite- dimensional  weak  limit  of  the 
localized  likelihood  ratio  in  (2)  takes  the  following  form,:  for  A(x)  s  dg(x,  Po)/dl3 

^00    (2)  =^100(1^)   x£2oo{m), 

4oo(w)  =exp  Iw'v-  -v'Jj^vj 
£2oo{u)  =exp  [u'EA{X)\p{X)  -  q(X)] 
/   l.u{j,x)dN{j,x) 

JE 

-h  (70) 


X  exp 


Ue 

d 


I^Mto) 


da 


where  lu{j,x)  =  In  ^1  [0  <  j  <  A(a;)'u]  +  In  ^1  [0  >  j  >  A(a;)'u] ,  N  is  the  Poisson 
process  in  Theorem  1.  W  is  normally  distributed  N  (O^Jy^)  and  independent  ofN. 

The  result  differs  from  that  in  section  3.  First,  A{X)  =  dg{x,Po)/d0  replaces  X, 
as  expected.  Second,  we  have  a  new  term  ^ioo(v),  the  log  of  which  is  a  normal  random 
variable  with  its  variance  inversely  related  to  the  information  matrix.  Thus,  ^ioo(i')  is  a 
standard  term  for  regular  likelihood  inference,  e.g.  van  der  Vaart  and  Wellner  (1996),  ch. 
7.  Note  that  if  the  parameter  /3  were  known,  we  would  end  up  only  with  the  standard  term 
^ioo(v).  Since  /?  needs  to  be  estimated,  we  have  the  mixture  of  "regular"  information  about 
the  shape  parameters  a  and  the  "non-regular"  information  about  the  location  parameters 
/?.  Moreover,  these  information  components  are  asymptotically  independent. 
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4.2     Asymptotic  Behavior  of  Bayesian  and  ML  Estimators 

Next  consider  the  normalized  Bayes  estimator  Z„  =  (Z^,  Z^)  =  {n{l3  —  /?„),  y/n{Q.  —  Q„)). 

Theorem  5  (Bayesian  Asymptotics  for  Nonlinear  Models)  Assume  model  R,  as- 
sumptions 2  and  3,  and  that  p(z)  =  pi(u)  +  p-ziv).  For  £^  (■)  in  Theorem,  4  define: 


"'R''  /rj  ^oo  [z')  dz' 


1.  Suppose  also  that  Zoo  =  argmin^gjjdFoo  [z)  is  uniquely  defined  in  M"*  a.s.   (*),  then 

Z-n  ^  -^oo 

2.  Z^  — >  Z^  =  argmin„  f^^^  pi  [u  -  u')  (.i^o  («')  du'  and  Z^  — >  Z^^  =  argmin„  /^^^ 
P2  {v  —  i'')^2oo  {v')dv'.  Z^  and  Z^  are  independent. 

Note  that  the  independence  is  due  to  multiplicative  separability  of  £oo(^)  in  ^looCw)  and 
^2oo{'^)  and  additive  separability  of  p(z).  If  the  additive  separability  does  not  hold,  part 
1  of  the  Theorem  4  is  still  applies,  while  part  2  does  not.  Consider  next  the  normalized 

MLE  Z„  =  {Zl  Z^)  =  {n0  -  /?),  ^{a  -  a))  . 

Theorem  6  (MLE  Asymptotics  for  Nonlinear  Models)  Under  model  R,  assump- 
tion 3,  and  assuming  that  ^ooC'^)  attains  a  unique  minimum  a.s, 

Zn  — >  Zoo  =  argmin^gKrf  -  ^00(2) 

By  multiplicative  additive  separability  of  £oo{z),  we  have  Z^  — ¥  Z^  =  J^^W  =  iV(0,  J~^) 
and  Z'  — >  Z^  =  argmin^gjjj  —  ^2tx3(z)-  Z^  and  Z^  are  independent. 

These  results  generalize  Theorems  2  and  3.  In  view  of  asymptotic  independence  be- 
tween the  shape  information  and  location  information,  the  estimators  for  these  parameters 
are  asymptotically  independent.  Also,  the  limit  distribution  of  the  Bayes  estimator  of 
shape  parameter  q  coincides  with  that  of  MLE,  if  the  loss  function  p2  is  symmetric  (by 
Anderson's  lemma).  This  is  not  the  case  for  the  estimators  of  the  location  parameter  /3. 

5     Efficiency 

The  Bayes  estimators  are  exactly  finite-sample  average-riskefficient(ARE)  under  particular 
loss  functions.  This  is  an  instance  of  a  well  known  result,  formally  stated  in  Theorem  7. 
Theorem  8  makes  this  statement  an  asymptotic  one.  These  results  justify  one  of  the  main 
efforts  of  this  paper  -  the  study  of  Bayes  estimators.  The  ML  estimators  of  location 
parameters  are  not  equivalent  to  Bayes  estimators  even  Eisymptotically  and,  unlike  the 
usual  case,  do  not  share  the  optimality  of  Bayes  estimators  in  large  samples. 

Average  risk  efficiency  is  one  of  the  classic  efficiency  concepts  developed  by  Wald, 
Lehmann,  and  others.  Before  writing  it  down  formally  for  our  csise,  it  is  helpful  to  review 
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the  basic  idea.  Given  a  parameter  7.  an  estimator  7.  and  a  loss  function  p^  (x)  =  p  [H~^x). 
we  can  compute  the  expected  risk  as  Ep^pn{-y  —  7  )■  The  average  risk  takes  the  form 
Jij  Ep^pni'y  —  7  )q{'y)d'y  where  9  is  a  weight  function  (e.g.  9(7)  =  1). 

To  address  the  asymptotic  results,  consider  the  following  notation,  define  i7„  as  in 
section  4,  and  let  7n((5)  =  jo  + Hn6.  Consider  all  statistics  (measurable  mappings  of  data) 
fn  =  fn{{yi,  ^i)i=i):  and  denote  the  set  of  all  such  mappings  as  F„. 

Define  the  (exact)  average  risk  criterion  (ARC)  as 


RpAf:K)   = 


[  Ep^^^,^  [p  [h;^'  [/„  -  7n(<5)])  ]q{i„{S))dS 

J  K 


/Leb(Ji:), 


where  q  is  the  weight  or  prior  measure  (e.g.  uniform)  and  p  is  the  loss  function,  defined 
earlier.  Division  by  'Leb{K)  is  immaterial  at  this  point. 

Theorem  7  (Finite-Sample  ARE)   Suppose  model  R  and  conditions  E.1-E.6  hold. 
For  fsayes  £  ^n,  defined  by  loss  function  p  and  prior  weight  q: 

fsayes  S  arg    inf    Rp,g{f,U„). 
/£F„ 

We  next  define  asymptotic  average  risk  (AARC)  as 

Rp{{fn},K)  =limSUpi?p,Leb(/n,-R'), 

n— >oo 

for  a  compact  cube  K  of  W'  with  center  0,  and  sequence  of  estimators  {/„}  in  {F„}.  To 
extend  this  definition  to  entire  M'' ,  define 

R,{{fn},R'')  =  limsup  [R,{{U},K)]  , 

where  K  '[  W'  denotes  an  increasing  sequence  of  cubes  converging  to  W' . 

Theorem  8  (Asymptotic  ARE)   Suppose  model  R  and  conditions  E.1-E.6  hold.    For 
{/sayes}  £  {Fn}j  defined  by  loss  p  and  prior  (weight)  q: 

inf         RpifnX)  =  Rp{{fBayes}X) 
{/"}€{F„} 

Because  Bayes  and  ML  estimators  of  location  parameters  are  not  Jisymptotically  equiv- 
alent (equivalence  holds  for  the  shape  psirameter  a  under  symmetric  loss  function  P2),  the 
ML  estimators  are  not  optimal  under  the  convex  loss  functions  considered  here.  This  is  in 
contrast  to  the  usual  case  where,  for  a  large  class  of  loss  functions,  MLE  is  asymptotically 
equivcJent  to  the  Bayes  estimator  and  shares  its  optimality. 

We  next  examine  whether  theoretical  efficiency  translates  into  actual  efficiency  gains 
with  a  simple  monte  carlo  example.  We  consider  simple  one  and  two  sided  models  with 
two  covariates.  In  the  one-sided  case  we  generate  data  as 

Yo  ~  Uniform  [c/m  +  (1  —  l/m)  c,  c] 
c  =  c-PiXi-  P-iX-i,    Xj  ~  Uniform  (a,  b),j  =  1,2 
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Table  1:  Mean  Squared  Error  For  Two  Simulation  Experiments 


One-sided  Model 
Two-sided  Model 


MLE  Bayes 


Pi  /?2  01  02 

0.8769     2.2030     0.0473     0.0395 
0.1761     0.1244     0.0587     0.0502 


The  distribution  in  this  example  can  be  rationalized  by  a  simple  procurement  auction 
model,  in  which  there  are  n  auctions.  In  each  auction  there  are  m  bidders.  Each  bidder 
draws  a  random  cost  C  from  the  uniform  distribution  on  (c,  c).  We  only  observe  the 
submitted  bids,  which  depend  on  C  through  the  Bayesian  Nash  equilibrium  bid  function: 
Y  =  c/m  -|-  (1  —  1/m)  C.  resulting  in  the  model  above. 

The  two-sided  example  is  a  contaminated  version  of  the  first  example.  In  particular, 
the  data  is  generated  from: 

Y  =  Yo  with  prob.  A  and  Y  =  Uniform  (L^c/m  -f  (1  —  l/m)c)  with  prob.  1  —  A 

where  we  chose  A  =  0.9  and  L  =  2.  We  simulate  the  above  two  models,  using  Pi  =  02  =  0.5 
for  n  =  200.  Table  1  reports  the  sum  of  mean  square  errors  across  300  simulations.  The 
Bayes  estimator  has  a  substantively  smaller  mean  square  error  than  MLE. 

6     Confidence  Intervals  and  Some  Practical  Questions 

The  obtained  results  enable  the  construction  of  confidence  sets. 

Confidence  Intervals.  We  must  distinguish  between  the  estimates  of  the  shape 
parameter  a  and  the  estimates  of  the  location  parameters  p. 

Inference  about  a  parameters  is  fully  regular.  The  limit  distribution  of  either  MLE  and 
Bayes  (for  symmetric  loss  function  pi)  is  given  by  N{0,J'~^^).  To  facilitate  inference  we 
need  to  estimate  J^^^-  This  can  be  done  by  conventional  methods,  taking  the  parameter 
estimate  P  as  given. 

CorollEtry  1   Under  assumption  3,  for  Bayes  or  MLE  a,P 

^  =  -\Y1  5!^ln/(^i  -  g{Xi,p)\Xi,p,a)  -^  J^„ 

i=l 

An  alternative  is  the  familiar  outer  product  of  scores  in  a,  which  we  do  not  state  for 
brevity.  The  resampling  methods  available  for  inference  about  a  include  subsampling  and 
bootstrap.  Although  the  bootstrap  is  not  investigated  formally,  we  can  conjecture  it  works 
due  to  Mammen's  theorem  and  asymptotic  normality  (see  Horowitz  (2000)). 

Inference  about  the  parameter  P  poses  more  difficulties.  Neither  Bayes  nor  ML  estima- 
tors have  a  standard  limit  distribution.  The  nonparametric  bootstrap  is  not  consistent  in 
the  present  setting.  A  simple  counterexample  is  the  boundary  model  without  covariates, 
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in  which  case  both  Bayes  and  MLE  are  functions  of  the  minimum  order  statistics.   The 
nonparametric  bootstrap  is  known  to  fail  in  this  case  (e.g.  Horowitz  (2000)). 

We  discuss  a  subsampling  and  an  analytical  approach  to  confidence  intervals.  We 
believe  subsampling  is  a  more  practical  and  computationally  simpler  method.  Following 
Politis,  Romano,  and  Wolf  (1999),  let  Wi,...,Wn„  be  equal  to  the  Nn  =  (")  subsets  of 
size  6  of  {{Yi.Xi).  i  <  n}.  ordered  in  any  fashion.  Let  /i ,  ...Ib  be  chosen  randomly  with  or 
without  replacement  from  {1,2,  ...Nn}-  Now,  let  On^^i  be  equal  to  the  statistic  of  interest 
9i,  evaluated  at  the  data  set  W, .  The  approximation  to  the  limit  distribution  function  of 
Tn{0  —  9),  where  Tn  =  n  for  the  location  parameters  or  t„  =  ^/n  for  the  shape  parameters, 
is  given  by 


1 

Ln,b{x)  =  -B^  i{n{On,b,Ii  -  On)  <  x} . 


B 

By  inverting  Ln,b{x),  we  obtain  various  a-quantiles  c„^t,ioi)  =  -^n U'*)-  ^^^  level  1  —  a 
two-sided  confidence  interval  is  obtained  as  [On  —  t~^c„,(,(1  —  a/2),^„  —  T~^c„,6(a/2)]. 
Similarly,  the  empirical  distribution  of  Tt,\6nfij-  —  On\  can  be  used  to  construct  symmetric 
confidence  intervals. 

Corollary  2  The  subsampling  method  of  estmating  the  lim,it  distribution  of  t„{0  —  0), 
where  0  is  Bayes  or  MLE,  is  consistent  in  the  sense  of  Politis,  Romano,  and  Wolf  (1999) 
(Theorem  2.2.1  (i)-(iii)),  and  the  asymptotic  coverage  probability  of  the  confidence  intervals 
achieves  the  correct  nominal  value,  as  long  as  b  —^  oo,  b/n  -4  0,  and  B  — >  oo,  as  n  —>  oo. 

The  choice  of  the  block  size  b  is  discussed  in  detailed  in  chapter  9  of  Politis,  Romano,  and 
Wolf  (1999).  They  provide  the  calibration  and  the  minimum  volatility  methods.  In  the 
empirical  section,  we  use  1/10  of  the  sample  size.  The  confidence  intervals  are  not  sensitive 
to  block  size  variation.  This  is  probably  due  to  the  fast  rate  of  convergence  to  the  limiting 
distribution.  The  insensitivity  principle  underlies  the  minimum  volatility  method. 

An  alternative  is  an  analytical  method,  based  on  simulating  the  distribution  of  a  poisson 
process  N,  taking  the  estimated  parameters  as  given,  then  obtaining  i^o  and  computing 
the  solutions  Z^o.  This  method  is  detailed  in  Chernozhukov  (1999).  In  the  present  context, 
subsampling  is  preferable  on  computational  grounds. 

Computational  Methods.  Modern  computational  methods  are  important  for  making 
the  inference  and  estimation  methods  available  to  practitioners.  It  used  to  be  that  the 
Bayes  computations  were  cumbersome  and  hampered  the  applicability  for  many  years. 
Since  approximately  1990,  this  problem  hsis  been  overcome  by  Markov  Chain  Monte- 
Carlo  (MCMC).  This  technique  allows  the  simulation  of  a  markov  chain  Zi,  ...Zj  whose 
marginal  distributions  are  approximately  the  posterior  distribution.  The  method  allows 
efficient  and  numerically  stable  computation  of  the  Bayes  estimates.  Detailed  discussion 
can  be  found  e.g.  in  Robert  and  Casella  (1998). 

We  implement  these  computational  methods  for  the  models  considered  in  this  paper. 
The  programs  cU'e  available  from  the  authors.  Our  implementation  is  feist  since  the  main 
subroutines  cire  coded  in  C.  Our  program  uses  uninformative  (flat  over  R"*)  prior.  To  com- 
pute the  MLE,  we  use  simulated  annealing,  an  algorithm  that  handles  general  nonsmooth 
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Table  2:  Summary  Statistics 


Worktype 

#auctions 

Avg. 

Stdev., 

Avg  #. 

winning  bid 

winning  bid 

bidders 

(1989$,  mil) 

2 

141 

1.006 

1.149 

5.91 

3 

181 

1.500 

1.870 

8.59 

4 

405 

5.015 

9.497 

7.46 

objective  functions.  Therefore,  we  provide  not  only  the  theory,  but  also  the  tools  and 
computer  programs  needed  for  implementation. 

7     Empirical  Illustration 

We  consider  a  data  set  of  bids  submitted  in  a  procurement  contract  auctions  conducted  by 
the  New  Jersey  department  of  transportation  (NJDOT)  in  the  years  1989-1997.  Over  this 
period,  the  NJDOT  conducted  1025  low-price,  sealed-bid  auctions  of  contracts  to  procure 
various  types  of  services  such  as  highway  work,  bridge  construction  and  maintenance,  and 
road  paving.  Most  of  the  services  procured  had  few  auctions  conducted.  In  the  following, 
we  consider  only  three  types  of  services.  See  table  (2)  for  the  summary  statistics.  Hong 
and  Shum  (2000)  give  a  detailed  description  of  the  data. 

We  focus  on  the  independent  private  value  model  formulated  in  Paarsch  (1992).  In  par- 
ticular we  assume  that  the  construction  cost  follows  independent  pareto  distributions,  as 
studied  by  Paarsch  (1992)  and  Donald  and  Paarsch  (2000).  Precisely,  the  cost  distribution 
for  construction  companies  is  given  by,  for  9  =  (Oi,  9-2): 

h{c)  =  ^^     0<9i<c,0<92. 

Paarsch  (1992)  and  Donald  and  Paarsch  (2000)  showed  that  this  implies  the  following 
density  function  of  the  winning  bids,  conditional  on  the  number  of  bidders  m  and  other 
covaxiates  that  affect  the  distribution  parameters  9: 

(     g2m{gig2(m-l)/|g2(m-l)-ll}''a"'  -r  giffaCm-l) 

f{y\m,9)  =  \  »"="-+'  II 2/ >  »,(™_i)_i 

I  0  otherwise 

Table  3  reports  the  ML  estimates  and  the  Bayesian  posterior  mean  estimates  for  this 
model.  The  variation  in  the  summary  statistics  of  the  winning  bid  across  types  of  contracts 
indicate  that  the  jobs  defined  in  these  contracts  are  very  different.  Hence  we  present 
separate  parameter  estimates  for  each  type  of  contract.  Also,  we  give  the  95%  equal  tailed 
and  95%  symmetric  confidence  intervals  constructed  using  subsampling  method  described 
earlier. 
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Table  3:  One  sided  Pareto  Estimates 


MLE 

Bayes 

worktype 

^1 

6-2 

logL 

01 

0-2 

2 

0.0160483 

0.57543 

-1341.05 

0.0789063 

0.706494 

equal  tail 

-0.0140822 

0.562586 

0.0455028 

0.541959 

0.0154688 

0.598813 

0.0839134 

0.73919 

symmetric 

-0.00578062 

0.552359 

0.0620981 

0.669373 

0.0378773 

0.598501 

0.0957144 

0.743614 

3 

0.0593045 

0.568535 

-1955.05 

0.0465869 

0.356914 

equal  tail 

0.0377357 

0.56294 

0.017097 

0.268854 

0.060384 

0.589204 

0.0493621 

0.357122 

symmetric 

0.0396208 

0.55182 

0.0264918 

0.291758 

0.0789883 

0.58525 

0.066682 

0.42207 

4 

0.0222681 

0.646935 

-7975.9 

0.146176 

0.555795 

equal  tail 

0.00208421 

0.627295 

0.131181 

0.329598 

0.0223695 

0.666575 

0.157989 

0.566001 

symmetric 

0.00392716 

0.627295 

0.13406 

0.368623 

0.040609 

0.666575 

0.158291 

0.742967 

Table  4  reports  the  parameter  estimates  from  an  alternative  two  sided  model: 


f[v\m,e)  = 


1-,         \\  e^m{e-i,0-i(m-l)l\e2(m-\)-l\\ 

\^  "^1  ySj-^  +  l 


"  2/  ^   e2{m-l)-l 

ifO<y<^iMiri:iil 


S2(m-1)-1 


and  0  otherwise.  We  chose  A  =  0.02,  which  accommodates  outliers  that  do  not  conform 
to  the  theoretical  model. 

In  table  5  we  introduce  a  continuous  covariate  for  the  traffic  volume.  This  covariate  is 
only  available  for  work  type  4.  We  parameterize  6i  =  exp  (qi  +  as  x  X)  and  62  —  exp  (02), 
where  X  denotes  traffic  volume.  The  coefficient  appears  to  be  significant,  although  there 
are  large  discrepancies  depending  on  the  estimation  method  and  the  model.  This  is  in- 
dicative of  misspecification. 

An  alternative  approach  to  direct  parametric  inference  for  independent  private  value 
auction  model  is  the  indirect  inference  approach  of  Guerre,  Perrigne,  and  Vuong  (2000). 
Their  insight  is  based  on  examing  the  first  order  condition  of  the  optimization  problem  of 
a  representative  bidder  i  in  the  equilibrium: 


max 

6 


f 

Jb 


g-i  [x)  (6  —  c)  dx 


-g^i  (b)  {b-c)+  G-i  (b)  =  0 


where  G-i  and  g-i  denotes  the  survival  function  and  the  density  function  of  the  distribu- 
tion of  the  minimum  bid  among  bidder  i's  competitors.  Therefore,  a  two  step  procedure 
can  be  used.  In  the  first  step,  G-i  and  g-i  are  estimated  using  the  bid  data.  In  the  second 
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Table  4:  Two  sided  Pareto  Estimates 


MLE 

Bayes 

worktype 

9i 

0-2 

logL 

0, 

02 

2 

0.259243 

0.55482 

-372.445 

0.22572 

0.810755 

equal  tail 

0.180359 

0.498768 

0.149349 

0.776126 

0.272495 

0.569737 

0.236047 

0.85172 

symmetric 

0.188709 

0.498771 

0.154353 

0.297086 

0.329778 

0.61087 

0.770756 

0.850753 

3 

0.418519 

0.512854 

-656.18 

0.3074 

0.361401 

equal  tail 

0.316776 

0.453844 

0.203662 

0.336442 

0.439423 

0.529476 

0.31702 

0.36304 

symmetric 

0.320138 

0.491592 

0.208184 

0.344999 

0.516895 

0.534106 

0.406617 

0.377803 

4 

2.87942 

0.535876 

-2375.17 

1.84626 

0.549167 

equal  tail 

2.46807 

0.529865 

1.35817 

0.53917 

3.1199 

0.553112 

1.98644 

0.569452 

symmetric 

2.55317 

0.518958 

1.42778 

0.530891 

3.20566 

0.552793 

2.26474 

0.567443 

Table  5:  Model  with  Traffic  Volume  (Type  4) 


MLE 

Bayes 

ai 

02 

as 

logL 

ai 

02 

as 

one-side 

-3.812 

-0.436 

-0.000 

-6996.4 

-1.096 

-0.488 

-0.697 

equal  tail 

-3.833 

-0.440 

-0.001 

-1.637 

-1.092 

-0.941 

-3.810 

-0.409 

0.006 

0.137 

-0.453 

-0.557 

symmetric 

-3.830 

-0.453 

-0.006 

-2.160 

-0.615 

-0.911 

-3.793 

-0.419 

0.005 

-0.032 

-0.362 

-0.484 

two- side 

-1.131 

-0.652 

0.547 

-1984 

-0.540 

-0.639 

0.091 

equal  tail 

-2.150 

-0.655 

0.544 

-0.842 

-0.670 

0.058 

-1.114 

-0.634 

0.645 

-0.461 

-0.588 

0.138 

symmetric 

-1.820 

-0.670 

0.451 

-0.812 

-0.681 

0.049 

-0.441 

-0.634 

0.642 

-0.269 

-0.596 

0.134 
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step,  the  pseudo-cost  values 

.9-1  (b) 
are  constructed  for  each  bid  in  each  auction.  Then  the  distribution  of  these  pseudo-values 
can  be  used  to  infer  the  latent  distribution  of  the  cost  parameter.    G_,  and  g-i  can  be 
easily  inferred  from  the  data  for  any  symmetric  affiliated  private  value  model.  It  suffices 
to  observe  the  winning  bid.  since 

■n-l  /  777  —   1  \     -  1 

G.dx)={F{x))   -        and     g_,(x)  =  i^-^^j  F  {x)-- f  {x) 

where  F  (•)  and  /  (•)  are  the  survival  and  density  function  of  the  whining  bid,  respectively. 
The  indirect  inference  approach  of  Guerre,  Perrigne,  and  Vuong  (2000)  uses  a  non- 
parametric  method  to  estimate  F  (x)  and  f  (x).   Here  we  consider  a  flexible  parametric 
approach.  We  used  the  truncated  normal  distribution  parameterized  as: 


/<''->(^)/*(^) 


where  we  take  In  6  =  ai  -I-  a4  x  m,  In  p  =  02  -I-  as  x  m,  In  a  =  03  -I-  Qe  x  m.  Table  6  reports 
the  results  for  worktype  4. 

First,  we  notice  that  the  new  model  is  an  improvement  over  the  pareto  model  in  terms 
of  the  log  Hkelihood  value,  suggesting  a  significant  improvement  in  the  fit.  Although  we 
did  not  develop  a  formal  testing  procedure  for  the  nonstandard  likelihood,  the  principle 
of  Vuong  (1989)  suggests  that  the  model  with  the  higher  likelihood  is  closer  to  the  data 
in  the  information-theoretic  sense. 

Second,  we  note  that  the  Bayes  posterior  mean  estimates  differ  from  the  maximum 
likelihood  estimates,  although  none  of  the  slope  coefficient  reverse  its  sign.  This  is  not 
surprising,  since  the  likelihood  surface  hcis  many  more  modes  when  the  number  of  parame- 
ters is  greater.  Based  on  the  computational  experiments,  theoretical  efficiency  properties, 
and  the  fact  that  the  Bayes  estimators  (posterior  means)  minimize  the  globally  convex 
objective  function,  they  may  be  preferred. 

We  did  not  report  the  results  for  the  case  with  the  traffic  volume  covariate.  Adding  this 
covariate  produced  a  tiny  improvement  of  the  log  likelihood  and  did  not  change  the  original 
coefficients.  The  estimates  for  the  traffic  volume  coefficients  were  highly  insignificant.  Both 
methods  yielded  agreeable  results  concerning  that  covariate. 

Overall,  the  two-sided  models  fit  data  better  that  one-sided  ones,  indicating  that  con- 
trolling outliers  that  do  not  conform  the  model  is  important.  We  also  observe  that  the 
parametric  variant  of  the  indirect  inference  approach  of  Guerre,  Perrigne,  and  Vuong 
(2000)  is  quite  valuable  and  allows  to  fit  the  particulcir  data  better  than  the  Pareto  model. 

Conclusion 

We  studied  a  general  model  in  which  the  conditional  density  of  the  dependent  variable 
jumps  at  a  location  that  is  parameter  dependent.  This  includes  a  variety  of  the  boundary- 
dependent  model  discussed  in  the  recent  literature  of  structural  estimation.    We  derive 
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Table  6:  Truncated  Normal  Model  for  Worktype  4 


MLE 

woiktype  4 

a-i 

Q2 

as 

ai 

Os 

oe 

logL  = 

-694.854 

-2.38883 

0.404125 

0.628959 

-0.0702671 

-2.58986 

0.207458 

equal  tail 

-2.56234 

-2.39153 

-0.23598 

-0.14270 

-2.61471 

0.167639 

-2.3802 

0.54492 

0.80567 

-0.04021 

-2.5829 

0.23593 

symmetric 

-2.55203 

-2.07808 

0.15484 

-0.12891 

-2.60658 

0.168674 

-2.22562 

2.88633 

1.10308 

-0.01162 

-2.57313 

0.24624 

Bayes 

Qi 

02 

as 

Oi 

as 

06 

4.40077 

-2.41695 

0.643259 

-2.38707 

-2.26628 

0.211352 

equal  tail 

3.84919 

-2.93896 

0.462919 

-2.60164 

-2.56957 

0.172165 

4.87133 

-1.88786 

1.047 

-1.8852 

-1.77164 

0.23446 

symmetric 

3.88809 

-2.94462 

0.347846 

-2.8686 

-2.72846 

0.182684 

4.91346 

-1.88929 

0.938672 

-1.90554 

-1.80411 

0.24002 

asymptotic  distributions  of  Bayes  and  ML  estimators  under  general  conditions,  and  offer 
practical  computation  and  inference  methods.  The  results  provide  a  solution  to  a  long- 
standing econometric  problem. 

Our  results  extend  previous  work  in  several  directions:  (1)  handling  general  regression 
models;  (2)  inclusion  of  Bayes  estimators,  which  enjoy  the  small  and  large  sample  effi- 
ciency; (3)  considering  the  two  sided  model  as  a  robust  alternative  to  the  one-sided  model; 
and  (4)  using  the  point  process  methods  to  give  a  precise  characterization  of  large  sample 
distributions.  Bayes  estimators  are  important  alternatives  to  MLE  due  to  efficiency  prop- 
erties which  the  MLE  does  not  share.  The  methodology  in  this  paper  also  provides  new 
insights  into  the  analysis  of  asymptotic  distributions  for  models  that  cannot  be  studied 
using  the  conventional  tools  of  locally  asymptotically  normal  models. 

The  empirical  application  presented  illustrates  the  usefulness  of  the  results.  We  esti- 
mated some  key  auction  models.  The  first  model  was  a  stylized  pareto  model  of  Paarsch 
(1992).  The  second  auction  model  represented  a  flexible  parametric  alternative  to  the  non- 
parametric  approach  of  Guerre,  Perrigne,  and  Vuong  (2000).  We  find  that  the  two-sided 
models  fit  data  better  than  one-sided  ones,  indicating  that  controlling  for  outliers  that  do 
not  conform  the  model  is  important.  We  also  find  that  a  parametric  variant  of  the  indirect 
inference  approach  of  Guerre,  Perrigne,  cind  Vuong  (2000)  is  quite  valuable  and  may  allow 
to  fit  data  better  than  the  direct  parametric  approach. 
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A     Useful  Background  Definitions 
A.l     Point  Processes 

Definition  1  (Point  Measures,  Mp{E),  cf.  Resnick  (1987))  Let  E  be  a  locally  compact  topo- 
logical space  with  a  countable  basis,  and  £  to  be  the  Borel  cr-algebra  of  subsets  of  E.  A  point 
measure  (pm)  p  on  {E,£)  is  a  measure  of  the  following  form:  for  {xi,i  >  1},  a  countable  col- 
lection of  points  (called  points  of  p),  and  any  set  A  ^  €:  p{A)  =  Yli  ^(^^t  G  A).  If  p{K)  <  oo, 
for  any  K  C  E  compact,  then  p  is  said  to  be  Radon.  A  p.m.  p  is  simple  if  p{x)  <  1  Vx  e  £, 
and  is  compound  otherwise.  Let  Mp(E)  be  the  collection  of  all  Radon  point  measures.  Sequence 
{pn}  C  Mp{E)  converges  vaguely  top,  if/  /dp„  — >  /  fdpioi  all  functions  /  e  Ck{E)  [continuous, 
real-valued,  and  vanishing  outside  a  compact  set]  (cf.  Leadbetter,  Lindgren,  and  Rootzen  (1983)). 
Vague  convergence  induces  vague  topology  on  Mp{E).  Topological  space  Mp{E)  is  metrizable  as 
complete  separable  metric  space.  Mp{E)  denotes  such  metric  space  hereafter.  Define  Aip{E)  to 
be  tr-algebra  generated  by  open  sets. 

Definition  2  (Point  Processes:  Convergence  in  Distribution.)  ^' A  point  process  in  Mp(E)'' 
is  a  measurable  map  N  :  (ft,  T,  P)  — >  {Mp  (E) ,  Mp(E)) ,  i.e.  for  every  elementary  event  ui  G  fi, 
the  realization  of  the  point  process  Nniw)  is  some  point  measure  in  Mp{E).  Weak  conver- 
gence of  the  point  process  N„  taking  values  in  Mp(E)  is  the  same  as  for  any  metric  space,  cf. 
Resnick  (1987):  we  shall  write  N„  =>  N  in  Mp{E)  if  Fp/i(N„)  -^  Eph{N)  for  all  continu- 
ous and  bounded  functions  h  mapping  Mp{E)  to  R.  Note  that  if  Nn  =>  N  in  Mp(E),  then 
Je  f{x:)<I^n{x)  — >  J^  f(x)dN{x)  for  any  /  e  Ck(E)  by  continuous  mapping  theorem. 

Definition  3   (Poisson  Point  Process  or  Random  Measure  (PRM))  Point  process  N  is  a 
PRM  with  mean  intensity  measure  m  (defined  on  {E,S)),  if 
(a)  for  any  F  e  £,  and  any  non-negative  integer  k 

^     '■^"^"1  0  ifm(F)  =  oo, 

(b)  if  {Fi,i  <  k)  are  disjoint  sets  in  £,  then  (N(F,),i  <  fc)  cire  independent  random  variables. 

A. 2       Convex  Objectives 

The  result  can  be  found  in  Knight  (1999).  It  is  a  generalization  of  earlier  convexity  lemmas  by 
Knight  and  Pollaid.  It  allows  discontinuities  and  R  -  valued  objective  functions. 

Lemma  1  (Guyer)  Suppose  {Qt}  is  a  sequence  of  lower-semi-continuous  (Isc)  convex  ^-valued 
random  functions,  defined  on  R**,  and  let  V  be  a  countable  dense  subset  o/R'*.  IfQx  fidi-converges 
to  Qoo  in  R  on  P  where  Qoo  is  Isc  convex  and  finite  on  an  open  non-empty  set  a.s.,  then 

argmin  Qt{z)  — >■    argmin  Qoo{z), 
provided  the  latter  is  uniquely  defined  a.s.  in  R  . 

A. 3     Stochastic  Equisemicontinuity 

The  remarkable  concepts  of  this  section  were  recently  developed  by  Knight  (1999).  The  following 
summarizes  some  essential  elements  we  need. 
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Epi- Convergence.  Suppose  the  sequence  of  objectives  {Qn}  are  random  lower  semi-continuous 
(I-sc)  functions  (i.e.  Q„(x)  <  liminf^.-n  Q„(j;j),  V2;,Vij  — >■  x).  Let  C  be  the  space  of  1-sc 
functions  /  :  R"^  — >  R,  s.t.  /  ^  oo.  C  can  be  made  into  a  complete  separable  metric  space  by  con- 
sidering a  special  metric,  convergence  in  which  is  equivalent  to  epi-convergence  (cf.  Knight(2000), 
Rockafellar  and  Wets  (1998)).  Hence  one  can  metrize  the  weak  convergence  in  C:  Qn  is  said  to 
epi-converge  in  distribution  to  Q  if  for  any  closed  rectangles  Ri,...,Rk  in  R''  with  open  interiors 
R°, ...,  Rl,  and  any  real  ri, ...,  rjt: 

P{  Dj^i  {  inf  Q{x)  >  rj})  <  liminf  P(  n^^^i  {  inf  Q„(a:)  >  rj}) 

x£Rj  n  x£Rj 

<  limsupP(  n*=i  {  inf   Q„(x)  >  r-,})  <  P(  n*^=i  {  inf  Q(x)  >  rj}). 
Epi-convergence  is  a  weak  condition  that  leads  to  the  convergence  of  axgmins. 
Lemma  2  (Knight,  Theorem  1)   Suppose  that 

i.     Zn    is  s.t.    Q„{Zn)   <  ini^^jidQn{z)  +  (n,   En   \  0;  Zn  =  Op(l) 

ii.  Zoo  =  dTgmin^^^dQooiz)  is  uniquely  defined  in  R    a.s. 
Hi.   Qn{)  epi-converges  in  distribution  to  Qoo{),  then 
Zji  — — >  Zoo 

Epi-convergence  is  more  general  than  uniform  convergence,  because  it  allows  for  rather  generjil 
discontinuities.  In  our  case,  (lots  of)  non- vanishing  discontinuities  make  the  uniform  convergence 
of  the  likelihood  function  impossible. 

Provided  the  finite  dimensional  distributional  (fidi)  limit  exists,  the  necessary  and  sufficient 
condition  for  epi-convergence  in  distribution  is  stochastic  equi-lower-semi-continuity  (s.  e-l-sc), 
developed  by  Knight  (1999). 

Stochastic  equi-semi-continuity.  Sequence  {Qn}  £  £  is  s.  e-sc.  if  for  each  bounded  set  B, 
e  >  0,  and  (5  >  0,  there  exist  ui,...,Uk  G  B  and  some  open  sets  V(ui),...,V{uk)  covering  B  and 
containing  ui,...,Uk  s.t. 

limsupP(Uj=i  {     inf     Q„(x)  <  min(e"\(5„(uj)  -  e)})  <  <5. 

Lemma  3  (Knight,  Thm  2)  Suppose  Qn  is  s.e-lsc.  Then  {Qn}  converges  to  Qoo  in  distribu- 
tion in  finite- dimensional  sense  if  and  only  if  {Qn}  epi-converges  in  distribution  to  Qoo(). 

The  s-esc  condition  amounts  to  the  possibility  of  approximating  the  distribution  of  the  infimum 
(and  hence  of  argmin)  of  Q„  over  bounded  set  B  by  an  approximate  minimum  of  Q„  over  a 
carefully  chosen  grid  {ui,  ...«*},  with  given  precision  (e,  5).  It  is  worth  emphasizing  that  the  naive 
uniform  grids  will  not  do  the  job  in  our  case.  The  application  of  stochastic  equisemi-continuity 
leads  to  a  simple  proof  even  in  the  no-covariate  case,  which  could  improve  (in  terms  of  length) 
over  the  arguments  of  Ibragimov  and  Hasminsldi. 

B     Proofs  for  the  Linear  Model 

In  the  proof  we  set  the  local  parameter  sequence  j3„  =  Po-  Putting  through  the  general  IoceJ 
sequence  Pn  does  not  change  the  arguments  but  introduces  a  lot  of  notational  complexity. 
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B.l      Proof  of  Theorem  1. 

Consider  the  local  log  likelihood  ratio  process  Q„  (z)  =  lnL„(/3o  +  z/n)/Ln{fio) 

n 

<?„(2)  =  ^g,„{^)  X  [l(e,  >X',z/nVO)  +  l(€,  <X',z/nA.O)] 
1=1 

n 

+  ^9.n(^)  X  [1(0  <  e,  <  X'.zfn)  +  1(0  >  £,  >  X',z/n)] 
1=1 

=  Q\n  (z)  +  Qin  (z) ,  where 

q,n{z)  =  In  [fiYi  -  X.'(/3o  +  z/n))\X,,l3o  +  z/n)/f{Y,  -  X,'/3o|X„  ^o)]  • 
Qi„{z)  and  ^211(2)  behave  very  differently. 

I.  Limit  of  Qin{z)  is  analyzed  by  techniques  similar  to  those  in  IH  (1982).  {Q\n{z)}  is  an  average 
like  statistic.  By  CI,  C4  and  LLN,  it  converges  in  probability  for  each  z: 

Qin{z)  — >  Qloo(z)  =  -z  EX  (6) 

f(€t\X,Po) 

For  a  density  function  /  that  has  a  dominated  derivative  everywhere  except  at  0;  fj^f'(u)du  = 
— /(O""")  +  /(0~).  Thus  applying  this  to  the  conditional  density  /  (e|X,/9o),  we  have: 

Qi^{z)  =  z'EX\p{X)-q{X)]. 

Next  we  use  stochastic  equi-continuity  to  convert  pointwise  convergence  to  uniform  convergence 
in  z.  It  suffices  to  show  that  for  any  \zi  —  22!  — >  0,  the  term 

n  n 

|^gi„(2i)  X  l(ei  >X,'zi/7jV0)-^gi>i(22)  x  l(ei  >X,'z2/nV0)|  ^0,  (7) 

1=1  1=1 

as  well  as  other  similar  terms  in  Q„  (2).  The  left  hand  side  of  (7)  is  bounded  by 


71 

^l(e,>  4^  v4^V0)  x    ln/(6i-4^|/3o  +  zi/7i)  -  In/ (e.  -  4^|^o  +  22/n) 
1=1 

,  ,       ,  ,       fU-^\0o  +  z,/n) 

By  C.4,  the  first  sum  in  (8)  is  bounded  by,  for  2'  in  the   convex  hull  of  21  and  22 

>     l(e,  >  XiZi/n  V  XiZ2  n  V  0)  x    -— — — — -—    x    

jr{  \  f(ei-X[z'ln\Po-\rz-ln)\      I  n  I 

<-y;i(ei>0)x   const    x  |{}''|f°,''  T'  x  [21  -  22I  =  Op(l)  x  \zx  -  22I 
n  -^  I  /(cil^o)  I 

The  second  sum  in  (8)  is  bounded  by,  for  some  C,  C,  C"  >  0 

sup    f:i(e,e(0,^V^])xCx|n!l^rx&N^ 

1     " 
<     sup     -  E  1  (e,  e  (0,  C'/n])  x  C"  =  Op(l) 

z-i  .zoPZ   '^     ■       , 


(8) 


(9) 


l.zjSZ  ■■■    ._j 
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since  f'{t\x)  is  uniformly  bounded  above  and  f{u\x)  uniformly  bounded  below  when  €  >  u  >  0 
by  C.2-C.3,  and  X  has  a  compact  support.  Thus  (7)  follows.  Other  terms  are  similarly  checked. 
II.  Limit  of  ^271(2)-  Qzn  is  driven  by  the  "rare"  occurrences  of  the  near-to-jump  observations 
and  can  be  modeled  as  an  integral  w.r.t.  to  the  point  process  that  measures  these  occurrences. 
We  split  the  proof  into  two  parts:  step  1  constructs  the  key  point  process  and  derives  its  limit 
representation.  Step  2  shows  that  {Q2n{zj),j  <  J)  is  a  continuous  transformation  of  the  point 
process. 

Step  1:  The  Key  Point  Process.  Define  E  =  (  — oo,0)  U  (0,+oo)  x  X.  The  topological  space  on 
E  is  taken  to  be  a  product  of  standard  topologies  of  R  \  {0}  and  R"*  n  X.  So  that,  e.g.,  [a,  b]  x  X 
is  a  compact  subset  of  E.  £  is  the  Borel  u-algebra  of  subsets  of  E. 

The  key  point  process  is  a  random  measure  taking  the  following  form; 

n 

f<!{A)  =  J2l[{ne„Xi)eA], 
1=1 

for  any  set  A  in  £.  N  is  a  random  element  of  Mp(E),  the  metric  space  of  nonuegative  point 
measures  on  E,  with  the  metric  generated  by  the  topology  of  vague  convergence.  Appendix  A 
gives  definitions.  We  will  show  that 

N  =»  N  in  Mp(E) 

where  N  is  defined  in  the  proposition  1.  This  is  done  in  steps  (a)  and  (b). 

(a)  By  C.l  and  C.3,  for  any  F  6  T,  the  basis  of  relatively  compact  open  sets  in  E  (finite  unions  and 

intersections  of  open  bounded  rectangles  in  E),  limn-+oo  BN(F)  =  limn->oo  nP({ne,X}  £  F)  = 

/   \p{x)l{u  >  0)du  +  q{x)l{u  <  0)du]  x  dFx(x)  =  m{F)  <  00 

where  measure  m  is  defined  in  the  proposition.  Since  the  events  {{nei,Xi)  £  F]  are  independent 
across  i  by  C.l,  by  the  Meyer's  Lemma  (  Meyer  (1973)  )  we  also  have: 

lim  P{SS{F)  =  0)  =  e-"'<^*, 

n— voo 

which  by  Kallenberg's  Theorem^  [  N  is  clearly  simple  a.s.]  and  the  definition  of  the  Poisson 
process  (  Appendix  A)  implies  that  N  =>  N  ia  Mp(E),  where  iV  is  a  Poisson  point  process  with 
the  mean  measure  7n(). 

(b).  Next  we  show  that  N  has  the  same  distribution  as  the  process  N,  stated  in  the  proposition. 
First,  define  canonical  homogeneous  PRMs  No  and  Nq  with  points  {Pj}  aind  {P;},  defined  in  the 
proposition.  No  has  the  mean  measure  mo(du)  =  du  on  (0, 00),  and  No  has  the  mean  measure 
77io(d'u)  =  du  on  (— oo,0).  Now  because  No  and  No  are  independent, 

Ni(-)=No(-)+No(-)' 

is  the  Poisson  point  process  with  the  mean  measure 

mi{du)  =  \(u>  Q)du  +  \(u  <  Q)du  on  R\  {0}. 

Because  {A",,  A"/}  are  i.i.d.  and  independent  of  {FijPj},  by  the  Composition  Lemma(cf.  Proposi- 
tion 3.8  in  Resnick  (1987)),  the  composed  PRM  N2  with  points 

{{Ti,Xi},{T'i,Xl},i>l,j>\)  inR"* 


^For  example,  Resnick  (1987),  Prop.  3.22 
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has  the  product  mean  intensity  measure  given  by 

m2{du,dx)  =  [l(u  >  0)du  +  l(ii  <  0)du]  x  Fx{dx)  on  R  \  {0}  x  X. 
Finally,  N  with  the  transformed  points  {T{r,,X,),T{r',,X-)},  where 

T:{u,x)>-^(l{u>0)-^  +  l{u<0)^,    x) 

has  the  desired  mean  measure  on  E 

■m{dj,dx)  =  m2oT~\dj,dx)  =  \p{x)l{j  >  0)  +  q{x)l{j  <0)]dj  x  Fx{dx), 

by  the  Transformation  Theorem  for  Poisson  Processes(  Proposition  3.7  in  Resnick  (1987)). 
Step  2:  The  Functional  of  the  Key  Point  Process.  Here  we  distinguish  two  cases  (a)  X  =  {x  G 
X  :  q{x)  >  i5  >  0}  and  (b)  X  s  {r  G  X  :  q{x)  =  0}.    Note  that  by  assumption  C.3,  for  any 
compact  set  Z,  as  n  ^  oo, 


In 


f(5  —  x'z/n\x,l3o  +  z/n) 


f{s\x,M 


In 


q(x) 


p{x) 


a  +  Oin-')) 


uniformly  in  {<5,^,x  6  R+  x  Z  x  X  :  x'z  >  0,  0  <  5  <  x'z/n},    and 


In 


f(S  —  x'z/n\x,  Po  +  z/n) 


mx,po) 


In 


p(x) 


Mx). 


{l  +  0(n-')) 


(10) 


(11) 


uniformly  in  {S,z,x  eR-  x  Z  x  X  :  x' z  <  0, 0  >  5  >  x'z/n}.  Note  that  in  case  (a)  this  holds  by 
C.3,  while  in  case  (b)  equation  (10)  holds  identically  equal  to  — oo.  Hence 


Q2n(z)  =  [Dn^l  [0  <  nu  <  X.'z]+f^ln^l  p  >nu  >XU]]^{l  +  o,(\)) 


sQ2„(z)x(H-0p(l)) 


(12) 


uniformly  in  z  over  Z.    (Again  this  expression  may  equal  — oo  in  ceise  (b),  but  that  does  not 
create  a  problem  for  the  proof.)  Write  Q-2n{z)  as  an  integral  w.r.t.  N: 


Q2u{z)=    f   h{j,x)d^U,x), 
J  E 


(13) 


where  lz{j,x)  is  defined  in  the  theorem.  We  consider  cases  (a)  and  (b)  sepsurately: 
(a):  By  conditions  C.1-2,  the  function  (j,x)  i->  lz{j,x)  is  bounded  and  vanishes  outside  the  set 
Kz  =  [—T],  +7)]  X  X,    T]  =  sup3,gx  l^'-^l,  where  77  <  oo  by  C.l.  Kj  is  a  compact  set  in  E  (by  the 
standard  compactification  of  E).  The  function  {j,x)  i->  lz{j,x)  is  discontinuous  at  j  =  0  and  at 
j  =  x'z.  Next  define  the  map  T  :  Mp{E)  i-^  R'  as 

N^(^J^h,{j,x)dN{j,x),k<iy 

The  mapping  T  is  discontinuous  at  the  set 

V(T)  =  [Ne  Mp{E)  :  jf  =  0  or  jf  =  z'^xf  for  some  i>l,k<l} 
where  (j/^ ,  if* ,  i  >  1)  denote  the  points  of  N.  By  C.3  and  the  construction  of  N 

P[N  6  ©(T),  for  some  n  >  1  ]  =  0,      P[N  6  I>(T)]  =  0.  (14) 
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Therefore  N  =>  N  in  Mp{E)  implies  T(N)  -^  T(N).  We  conclude  that 

{Q-2n{zk),k<l)  -^  (Q2oo{zk),k<iy  (15) 

where  Q2oo{z)  =  f^L{j,x)d'N{j,x). 

(b):  Now  consider  the  second  case:  i2n{z)  =  explQanCz)}-  Note  that 

£2n(z)  =  0  if  N(A(Z))  >  0,        hn{z)  =  1   if  n{A{z))  =  0, 

where 

A{z)  =  {{j,x)  eR+  xX:j<  x'z). 

Observe  also  that  i2oo(z)  =  0  if  N(^(z))  >  0,  ^200(2)  =  1  if  N{A(z))  =  0.  Thus  to  show  finite- 
dimensional  convergence  (for  A^  =  0  or  1): 

p{l2Azk)  =  Ak,k  <  k)  —->  P(£2<»(2*:)  =  Ak,k  <  k), 

it  suffices  to  show  {f<S{A(zk)),k  <  Kj  — >  {'N(A(zk)),k  <  Kj.  By  the  continuous  mapping 
theorem,  this  follows  from  N  ^  N  in  Mp{E),  since  N(9yl(z*;))  =  0  and  N(9j4(za:))  =  0  a.s.  ■ 

B.2     Proof  of  Theorem  2 

Using  the  convexity  lemma  (1),  it  suffices  to  show  the  finite-dimensional  (fidi)  convergence  of 

r„()  toTooC-), 

(rn{zk),k<K'^  -U  (r^(zk),k<Ky 

r„  (z)  is  an  integral  with  respect  to  in  over  R** .  There  are  two  steps  in  the  proof: 

1.  Approximate  r„  (2)  over  R**  by  an  integral  over  a  compact  subset  of  R**.  For  leirge  M  €  R4-, 
approximate  r„  (z)  by  Tf  (z)  =  rf,  (z)  /T'^^,  where 

rji^i  (2)  =  /  p{z-  u)  in  («)  q  {Po  +  u/n)  du,  rfj  =  [  tn  {u)  q  (^0  -t-  u/n)  du. 

J\u]<M  J\u\<M 

Define  their  corresponding  limit  process  cis 

rft,  (z)  =    /  p  (2  -  u)  too  (m)  q(l3o)du,    r'^  =  loo  (u)  q{Po)du. 

J\u\<M  J\u\<M 

Now  suppose  that  for  each  z  and  each  e  >  0, 5  >  0,  3M  6  R+ : 

lim  sup  P  (|r„  (z)  -  r^  (z)  I  >  e)  <  J,  (16) 

This  is  a  tail-smallness  property.  It  follows  from  the  exponential  smallness  of  the  likelihood  tails, 
which  is  demonstrated  in  the  subsequent  lemmas  (4)  ajid  (5). 

2.  In  view  of  (16),  it  suffices  to  show  (ri^.T^i  (z*)  ,fc  <  k\  -^  {t^oc,T^oo  (zk)  ,k  <  if V  This 
follows  by  two  facts: 

i  Elniz)  =  Koo 
ii   lEll'^'iz')  -  eV^{z)\^  <  C\z  -  z'\ 

Fact  i.  is  the  definition  of  likelihood  ratio.  Fact  ii.  is  by  Lemma  5-6.  These  facts  check  the 
conditions  of  a  limit  theorem  for  integrals  of  random  functions,  Theorem  22  in  Appendix  I  of  IH 
(1982).  ■ 
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B.3     Proof  of  Theorem  3 

By  condition  C.3,  — ^„  is  lower-semi-continuous.  Two  steps  axe  needed: 

•  Show  finite-dimensional  convergence  of  in  to  the  stochastic  limit  ^oo , 

•  Show  the  stochastic  equi-lower-semi-continuity  (e-lsc)  of  {  —  in}, 

The  first  step  was  shown  in  Theorem  1.  We  demonstrate  the  second  step  next.  Theorem  1.5.1  in 
IH(1982)  shows  that  the  conclusion  of  lemma  (6)  implies  that  Zn  =  Op(l).  Combining  these  two 
steps,  the  conclusion  follows  by  Lemma  2. 

It  remains  to  show  stochastic  equi-lower-semi-continuity  (e-lsc)  of  {—in},  or  equivaJently  of 
{— <52n  =  —  log^n}.  We  know  from  the  proof  of' Theorem  1  that: 

Qln{z)  -  Ql^iz)  -A  0,         Q2n(z)  "  Q2n{z)  -^  0, 

uniformly  in  z  over  fixed  compact  sets,  where  Q2n(z)  =  J^h{j,x)d'N{j,x).  Qiooiz)  is  a  fixed 
linear  function.  It  therefore  suffices  to  show  s-e-lsc  of  {—Q2n(z)}  only. 

Because  Q2n(z)  is  a  piece- wise  constant  function  it  suffices  to  show  that  or  ajiy  bounded 
set  B  C  R**  and  S  >  0,  there  aie  open  neighborhoods  V(zi),...V{zm)  of  some  zi,...,Zm  s.t. 
B  C  ur=iV(zjt)  and 

P     Ur=l{       inf        -Q2niz)  <  -Q2n{Zk)}      <  S- 

This  is  done  in  several  steps. 

(a)  [A  seemingly  strange  point  process]  Construct  the  point  process: 

iV(-)  =  J2  1  [ineip{Xi),Xi)  e  •]  +  ^  1  [(neiq{X,),X.)  G  •] . 

Ci>0  ei<0 

Represent  the  points  of  N  equivalently  in  terms  of  the  order  statistics 

ri„,r2n,..-       o{{neip(Xi),      i  :  e,  >  0  },     ri„,r'2„,...       of  {  n£,g(X,),      i  :  tv  <  0  }. 

so  that  0  <  Fin  <  T2n---  ;  0  >  r'i„  >  r2„...  .  Denote  by  Xin,Xin  the  corresponding  to 
r,„,r;„  realizations  of  the  covariate.  Thus  iV()  =  J2i>i  1  [(Fin,  AT.n)  e  •]  -I-  1  [(r'i„,X,'„)  e  ■] . 
iV  is  a  continuous  transform  of  N,  say  T(N),  from  Mp{E)  to  Mp{E).  Therefore,  in  M-p(E) 
N{.)  =>  N()  =  T(N)  =  E,>il[(ri,-^0  e  ■]  +  l[iT'i,Xl)  e  •]  where  the  distribution  of  the  points 
is  defined  in  the  statement  of  Theorem  1.  Also,  by  continuous  mapping  theorem,  in  R* 

{rin,—,Tkn)  — >  (ri,...,rA;). 

We  need  all  of  this  to  chciracterize  the  equisemi-continuity  of  Q2n-  Write 

Q2niz)  =    f  hij,x)dNU,x)  +    f  h{J,x)dN(j,x)  =  Q+n{z)  +  Q^niz)- 

JE:j>0  JE:j<0 

We  examine  discontinuities  of  Q2n  by  examining  that  of  Qtni^)  ^^d  Q2ni^)-  Because  the  argu- 
ments are  identical  for  either  part,  consider  only  part  OjnC-^)- 

(b) [Picking  the  Cover]  Fix  a  bounded  set  B  C  R'^.  Cover  X  by  the  minimaJ  number  of  closed 
equal-sized  cubes  {X^{xj),j  <  J(4')}  with  the  side-length  equal  to  <(>  and  the  centers  of  cubes 
denoted  as  Xj.  Construct  sets 

{Vkj,k=  -m,...,m,j  <J{4>)}  CR"* 
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as  Vkj  s  {z  6  R"*  ■■  Vf.  —  (p  <  p{x)x'z  <  Vf.  +  <p,^x  e  X^{xj)},  where  ip  >  0  and 

Vf.  =  k(p,  for  k  G  {—m, ...,  0,  ...m}. 

Since  X  is  a  nondegerate  compact  subset  of  R'',  and  p{x)  is  bounded  away  from  0  and  above  by 
assumption,  {UkjVkj}  can  cover  any  given  bounded  set  B  by  selecting  sufficiently  large  m. 
(c) [Number  of  Break-points  is  Op(l)  and  separated  for  small  ip]  Consider  argument  z  in 
UjVJtj,  then  a  discontinuity  in  Qj„(z)  can  potentially  occur  in  UjVkj  only  if  there  exist  z*  £  UjVJtj 
and  (TintXin)  s.t. 


f  Z7i  —  P\Xin)X.jjiZ    , 


(17) 


where  it  must  be  that 

Uk  -  'P  ^  r,„  <v^  +  'p- 

If  there  is  such  (r,n,^in),  we  say  that  Q^„  has  a  breakpoint  in  OjVkj-  Note  that  Qj„(z)  can  not 
have  breakpoints  in  Vkj  with  fc  <  0  because  Tin  >  0.  Define  A/k  =  #{«  :  Tin  <  fc},  A/"  =  #{»'  : 
Fi  <  fe},  where  k  =  snp^^^^^gp{x)x'z.  Mn  is  the  upper  bound  on  the  number  of  breakpoints  of 

Qf„(z)  in  set  B.  By  continuous  mapping  theorem,  jV„  — ^  M  in  R.  So  the  number  of  breakpoints 
0{J^n)  is  stochastically  bounded  Op(l).  Furthermore,  break-points  are  separated:  no  more  than 
one  break  point  can  happen  in  UjVkj  with  probability  arbitrarily  close  to  one  if  ip  is  sufficiently 
small.  Define  Ak  to  be  the  event  that  Qtni^)  has  more  than  two  break-points  in  UjVkj. 


limsup    P  [UfcAfc]  <  hmsup    P 

n  n 

<  hmsup    P 

n 

<  P 

<5/2 


,  min     |F,„  -F(i_i)„|  <2ip 
min     |Fi„  -  F(,_i)„|  <  2ip 


min     IF;    —  F/i_i)  I  <  2(^ 


+  P[Nu>K] 
+  P[N'>K\ 


which  is  achieved  by  setting  K  sufficiently  large  so  that  P  [N'  >  K\  <  5/4,  followed  by  setting  (p 
sufficiently  small  so  that  P  [mini<i<K   |F,    —  F(,_i)  |  <  2<^  ]  <  5/4,  which  is  possible  since 

E  [   min     |Fi-F(,_i)ll  =   E  [  min     l^jl  >  0. 
Ll<i<K  ^       •"  J  l-i<'<^         J 

Prom  now  on,  K  and  <pi  are  fixed, 
(c) ["smart"  grid-points  by  setting  cp  small]    Next  construct  "centers"  Zkj  in  Vkj  so  that 

Vf.  —  (p<  X  Zkj  <  Ufc  — ¥'  +  ';,   Vx  e  x^{xj) 

where  t;  :  0  <  77  <<  <^  will  be  set  sufficiently  small  in  the  next  step.  Depending  on  7/,  to  satisfy 

the  constraints,  we  will  set  4>  sufficiently  small  as  well. 

(d)  [Stochastic  Equi-semicontinuity]  Now  follows  the  verification  of  stochastic  equi-semicontinuty: 

limsupPfuij  {       inf       -Qtni^)  <  -Qtni^kj)}]  <  lim  sup  P  [5(77)]  -I- lim  sup  P  [  Ufc  Ak] 

where  B(t])  is  the  event  that  {Fyn,i  <  K}  are  separated,  and  at  least  one  of  them  falls  into  one 
of  the  "small"  disjoint  sets  of  the  form: 

hk,  -V,V.ki  -f  +  rj],     i  <K. 
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The  bound  is  true  because  the  last  event  contains  the  event  that  (r,„,i  <  K)  are  separated,  and 
one  of  them  falls  into  a  set  of  the  form  [vi^   —  tp,  X'j^Zkj]  which  actually  is  the  event 

Ut;{       inf       -Qo+„(z)<-Q+.(z,,)}n(U,.4,r, 

because  —Qtni^)  '^  piecewise-constant  and  can  only  jump  up  if  the  index  X'^z  increases.  Now 
because  (ri„,i  <  K)  — >  (r,, i  <  K),  which  have  the  bounded  density,  it  follows  that 

limsupF[B(7;)]  =  0(KTfi  <  djl. 

n 

by  picking  sufficiently  small  tj.  (Note  that  K  stays  fixed  and  its  choice  does  not  depend  on  rj). 
By  step  (b)  limsup„  P[B(77)]  +  limsup„  P[UtAt]  <  S.U 

B.4     Lemmas  4-6 

A  critical  ingredient  in  the  proof  of  Theorem  2  is  the  requirement  of  tail  smallness.  In  this  section 
such  properties  of 

en(z)  =  Ln{l3  +  z/n)/Ln(J3). 

are  established,  where  z  6  R**. 

Lemma  4  Suppose  36  >  0,  B  >  0,  s.t.  Wz  eW' ,z'  eR'' .  3no  s.t.  Vn  >  no: 

ii)Ep/„  {zf"  <  e-''l=l;  {ii)Ep^\£n  {z)"^  -  £„  (Z)'^'  f  <  B\z  -  z'\, 

uniformly  in  /3  in  an  open  ball  at  /3o-   Then  for 
j,M  ^   f  in(u),iP,  +  uln)        ^^^  ^^^  ^^^^r  ^^^_  ^^        lAu),(Po +  uln)        ^^ 

J\u\>M  Ju^^n{u)q{Po+u/n)du  J\u\>M  fy^in  (u)  q{Po-i- u/n)du 

it  is  true  that  under  P  =  P^„ 

lim   limsup^ri^i  =0,        lim   limsupET^a' (-z)  =  0,  (18) 

M-*oo     „_>oo  M->oo     „_K3o 

so  that 

lim    limsupP  (|r„  (z)  -  rf  (z)  1  >e)  =  0.  (19) 

IH(1982)  studied  the  non-regression  models  ajid  verified  the  conditions  of  this  lemma  by  bounding 
a  Hellinger  distance.  Our  approach  requires  a  modification:  we  bound  a  conditional  Hellinger 
distance  for  the  model  R.  The  Conditional  Hellinger  Distance,  denoted  as  r2  (/3;I3  +  h),  is  defined 
as: 

\f'/^{y-x'{l3  +  h)  \x;  P  +  h)-  f"^  [y  -  x' p\x;  fi)  ? dyFx  (rfx)l 


[//I' 


Upper  and  lower  bounds  on  r2  (■)  are  used  for  to  verify  conditions  of  lemma  2; 

Lemma  5  //  there  are  a  >  0,  A  >  0  such  that  for  each  h>  0  small, 

infri(/3;/3  +  ft)>-^!^      and     sup rl  {13 ifi  +  h)  <  A\h\ 
P  1  +  l"l  /3 

where  sup/inf  are  computed  over  /3  in  open  ball  at  /3o-   Then  36  >  0,  B  >  0,  s.t.  Vz  6  R**,  z'  €  R'^, 
3rJo  s.t.  Vn  >  no-" 

supSp„£„  (z)'/'  <  e-""!^!;  sup£;pJ£„  (z)'/^  -  £„  (z')'^'  |^  <  B\z  -  z'\. 
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The  following  lemma  verifies  the  conditions  of  lemma  (5)  and  hence  also  verifies  the  bound  on 
the  conditional  Hellinger  distance  in  model  R. 

Lemma  6  In  model  R,  suppose  (C.1)-(C.3)  hold,  then  3a  >  0,  j4  >  0,.  such  that  for  all  h  >  0 
small  enough,  the  following  is  true: 

in{ rUP;  P  +  h)  >  a\h\      and     su-pri  (P;  j3  +  h)  <  A\h\ 

Proofs  of  lemmas  (4)  to  (6): 

Lemma  (4):   (18)  is  a  special  case  of  lemma  L5.2  and  Theorem  L10.2  of  IH(1982).    (19)  follows 

immediately  from  (18).  ■ 

Lemma  (5):  It  follows  from  the  definition  of  the  conditional  Hellinger  distance  that  rj  (/3;  /3  +  z/n) 

=2  [l  -  y  y  f"^  (y  -  x'  (/3  +  z/n)  |x;  /3  +  z/n)  f"^  [y  -  x'/3|i;  H)  dyFx  {dx) 


Also  Ei, 


n 


} 


f{yi  -  x'iP\x;P)dyiFx{dxi) 

j  J  f'^'^y  -  ^'(-^  +  ^/n)\xi;P  +  z/n)f'''  {y  -  x'/3|x;/3)  dyFx  {dx) 
we  can  bound  uniformly  in  P  =  P^ 

£'^„(z)i/2=    l-lrlip-p  +  z/n)       <  e-t^2'<^'''+"''"'  <  e"*  hT^. 

Similarly,  £|^„  {zf'  -  £„  {z'f^^  |'  =  Bin  (z)  +  El,,  {z')  -  2E(.n  (z)'^'  in  (z'f''' 
=  2  I-  ( Ex  J  f'^  {y  -  X'  (/J  +  zin)  \X;  p  +  z/n) 

f'^  {y  -X'(p  +  z'ln)  \X;p  +  z' jn)  dyj 
<2n  i-Ex  J  /'^^  (y  -X'{P  +  z/n)  \X;  fi  +  z/n) 

fl^  (y  -  X'  (/3  +  z'/n)  |X;/3  +  z'/n)  dy 
=  nrl{j3  +  z/n\l3  +  z'/n)  <A\z-z'\.    ■ 
Lemma  (6):  To  obtain  the  uniform  upper  bound,  for  e  >  0  small  enough 

rl{p;p  +  h)=Exj(f^'{y-X'{P  +  h)\X;p  +  h)-f'^'{y-X'/3\X;/3)ydy 
<Ex  fifty-  X'  W  +  h)\X;l3  +  h)-f{y-  X'P]X;  /3)  \dy 
<Ex  [  \f{y-X'{l3  +  h)\X;p  +  h)-f{y-  X'P\X;  0)  \dy 

J[X'0,X'W+h)] 

+Ex  I  \f{y-X'{fi  +  h)\X-fi  +  h)-f{y-X'p\X-p)\dy, 

J\X'B.X'(B+MY 
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where  [a,b]  =  [a,b]  ii  a  <  b  and  =  [b,a]  if  6  <  a.    The  first  inequahty  follows  since  \a  —  b\^  < 
\{a  +  b)(a  -b)\  =  [a^  -  6^  |  for  a  >  0  and  6  >  0.  This  is  further  bounded  by 


2Ex  \X'h\  (p  (.Y,/3)  +  q  (X,/3))  +  Ex  J  J 


h'^iy-  X'  (13  +  uh)\X;l3  +  uh) 
au  ' 


dudy. 


which  by  compactness  of  X,  Cauchy-Schwartz  inequality  and  changing  order  of  integration  by 
Fubini  is  bounded  by, 

<   const   x\h\+  const    x  \h\  /    Ex       \-^  {y  -  X' (P  +  uh)  \X ;  fi  +  uh)  dydu 

Jo  J    I  '^P 

The  upper  bound  is  now  obtained  by  condition  C.3. 

Now  bound  uniformly  in  (3  the  conditional  Hellinger  distance  from  below: 

rl(P;fi  +  h)  >Ex  [  (f'"  {y  -  X' W  +  h)  |X;/3  +  h)  -  f"^  {y  -  X'P\X-p))' 

J[X'P,X'{0+h}]  ^  ' 


dy 


>Ex  [  ~x 

>     ^x5x    f\X'h\)  =  ^x5x\h\xEx 


p"^{X,li)-q"\X,fi) 
X'h 


\h\ 


dy 
>   const  X  \h\, 


where  the  last  line  follows  because  inf^g^d.|^|_j  £'|X'c|  >  0,  since  Var{X)  >  0.  ■ 

C     Proofs  for  the  Non-Linear  Model 

In  the  proofs  we  set  the  local  parajneter  sequence  7n  =  7o-    Putting  through  the  general  loccil 
sequence  7n  does  not  change  the  arguments  but  introduces  a  lot  of  notational  complexity. 

C.l     Proof  of  Theorem  4 

As  in  Theorem  1,  we  split  the  log  likelihood  ratio  process  Qn  (z)  =  In  Z/n(/3o  +  H„z) / Ln(j3o)  into 
the  "jump"  part  and  the  "smooth"  part,  and  analyze  each  part  separately.  For  z  =  {u,v), 

n 

Qn{z)  =  Y^qin{z)  X  [l(e,  >  A„(Xi,u)/nVO)  +  l(£,  <  A„(A',,-u)/n  A  0)] 
1=1 

n 

+  X^9m(-z)  X  (1(0  <ei  <  A„(X.,K)/n)  +  l{0>fi  >  A„(Xi,u)/n)] 
1=1 

=  Qin  (z)  +  Q2n  {z)  ,  where 


f{Yi-g{Xi,Po  +  u[n))\X,,Po  +  u/n,  ao  +  v/y/n) 


''■"^^^~'"  f(Y,-g{X„Po)\X,,Po) 

A„(2:,u)  =  n{g{Xi,/3o  +  u/n)  -  g{X,,Po))- 

I.  We  first  find  the  uniform  limit  of  Q\„  (z)  .  In  this  part  only,  we  apply  the  arguments  similar 
to  those  of  IH  (1982).   For  each  z,  using  assumptions  (E.l)  to  (E.3)  with  a  second  order  Taylor 
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expansion: 


,     dg{X,Mf'{^\X,lo) 

ViT.  (z)  =  -  u  h — - 

0/3         f(e\X,-yo) 


^  7  =  1 

=  u'EA{X)  (p  (X)  -  q  {X))  +  Qin{z)  =  u'ti  +  Qi„{z) 

where  the  definition  of  p  and  Qin  {z)  is  obvious.  Information  matrix  equality  for  a  gives  —J  = 
^aiio|i^x^_  and  CLT  gives  ^  Er=i  £f  (^.•|^m7o)  ^  N  (0,J)  .  Hence 

Qm  {z)  =^  Qioc(z)  =  IV't;  -  ^v'jv  in  £°=(Z), 

a  continuous  Gaussian  process,  provided  we  can  show  that  this  convergence  is  uniform  by  demon- 
strating stochastic  equicontinuity  of  Qi„  (z).  In  particular,  for  any  \zi  —  z^]  — >  0,  we  can  show 
that  terms  like 


n 

^q,n(zi)l(ei  >  A„(X,,'Ui)/nV0)-5i„(z2)l(e,  >  A„  (X,,-U2) /n  V  0) 


0 


Split  the  term  into 
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^l(ei  >  A„(Xi,ui)/ny  A„{X,,U2)/nVQ) 

X  |ln/(ei  -  A„  {Xi,ui)  /n\Xi;Po  +  ui/n,ao  +  vifn) 
-  ln/(€,  -  A„  (Xi,U2)  /n\Xi;l3o  +  U2/n,ao  +  V2/n)  \ 

n 

+  ^l(f  €  (0,  A„  {Xi,u^)ln  V  A„  (X,,  U2)  In  V  0)) 


X  max 

j  =  l,2 


f  {a  -  An  (Xi,Uj)  ln\fio  +  Uj/n,ao  +  Vj/^/n) 
f{€,\/3o,ao) 


By  the  argument  of  the  proof  of  Theorem  1,  the  second  term  is  bounded  by  J^"_i  1  (e;  6  (0,  C /n  V  0))  x 
^  -^  0.    Each  summand  in  the  first  summation  can  be  bounded  by,  for  some  z'  between  21 
and  22, 

n 

^l(ei  >  A„(Xi,ui)/nV  A„(Xi,U2)/Ti  VO)  x 
1=1 

(   ^In/  (ei  -  A„  {Xi,u')  /n\Po  +  u' /n,ao +  v' /y/n) 
+  ^In/(€i  -  A„  {Xi,u')  /n\^o  +  v.' ln,ao  +  v'/y/n)    ^        ^  \ 
Using  assumption  (E.5),  the  first  term  in  this  summation  can  be  bounded  by 

c]^Y.\^\nf{,ib,)\'^'\u,-U2\  =  0,WW~U2\ 


■Ul  —  U2 


In 
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The  second  term  in  the  summation  can  be  further  expanded  as,  for  z"  between  2"  and  0, 


1 


d 


y  l(f,  >  A„(A-,,ui)/n  V  A„(.Y,,-U2)/nV0)— ln/(f,|A',,7o)(t;i  -  V2 


\v"  {Vi  -  V2) 


n  ^   aaaa'  \  n  v"/ 

Assumption  (E.6)  implies  that  the  second  term  is  bounded  by 

1     " 
const  X  \v"  (vi  -  V2)  |-  "S^ C"(u,Xi)  =  \vt.  -  i;2|Op(l) 

rt   ^ — ^ 


The  first  term  by  the  CLT  and  assumption  (E.4)  is  Op(l)  and  is  hnear  in  v\  —  V2.  (Note  that 
the  indicator  in  the  summation  can  be  replaced  by  \{tt  >  0)  by  Chebyshev  inequality.)  Thus  as 
\zi  —  22 1  -^  0  the  entire  term  goes  to  0  in  probability.  This  completes  the  proofs  for  stochastic 
equicontinuity  and  uniform  convergence  of  Qin  (z)  for  the  nonlinear  model. 

II.  Limit  of  Q2n  (z).  As  in  the  proof  of  Theorem  1,  in  the  expression  for  Q2n  [z)  we  can  replace 
qin(z)  by  either  \np(Xi)/q{Xi)  or  \ap{Xi)/q{Xi),  uniformly  in  z  in  Z,  depending  on  the  sign  of 
the  r.v.  e,.  Note  that  this  expressions  may  equal  —00.  So  uniformly  in  z  over  Z,  Q271  (z)  = 
Q2n(z)  +  Op(l),  where 

02„(z)  =  y]  [In  4^1(0  <  nu  <  A„(X„«))  +  In  4^1(0  >  ne,  >  A„(Xi,u)]. 

Next  we  note  that  by  straightforward  calculations 

n 

EY^\l{0<ne,  <  A„{Xi,u))-  1(0  <  ne,  <  A(X,,w))| 
1=1 

+  [1(0  >ne,  >  An(X,,u))-l{0>nu  >  A{X,,u))\  =  o(l), 
where  A  (Xi ,  u)  =  ^^^gg'^°^  u,  which  yields  that  for  any  fixed  z,  Qin  {z)  = 


E 
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ln^4^1(0<n£.  <A{Xi,u))  +  \ii^^liO>nu  >  A{Xi,u)) 


p{X,) 


q{x.) 


+  Op(l). 


Having  obtained  the  "linearized  expression,"  the  finite-dimensional  convergence  of  Q2n  (z)  now 
follows  by  the  arguments  identical  to  those  in  Theorem  1.  It  remains  to  show  that  (Q2n  (zi) ,  i  < 
J)  and  (Qin(zi)  ,i  <  J),  for  ajiy  finite  J,  are  asymptotically  independent.  This  follows  by 
noting  that  the  dependence  between  these  terms  is  realized  through  the  sums  that  disappear  in 
probability  for  any  fixed  z  =  {u,v)  ;^  Er=i  £i^f{^i\Xi,lo)liu  e  (0,  A„  (Xi,«)))  =  Op(l), 
-^  Er=i  £  In/  (ei|Xi,7o)  1  {u  e  (A„  (Xi,u)  ,0))  =  Op  (1)  ■ 

C.2     Proof  of  Theorem  5 

Similar  to  theorem  2,  the  proof  is  a  straightforward  application  of  convexity  lemma  (1),  by  which 
it  suffices  to  show  the  finite-dimensional  (fidi)  convergence  of  r„  (■)  to  Too  (), 

{rnizk),k<K)  -A  (r^(zk),k<Ky 

As  before  we  follow  two  steps. 
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1.  First  we  approximate  the  integral  Fn  (z)  over  R''  by  an  integral  over  a  compact  subset  of  R**. 
For  each  M  e  R-|.,  the  approximation  of  r„  (z)  is  given  by  F^'  (z)  s  T'^i  (z)  /r^2  where 

r^,  (Z)   =    [  P{Z-Z')  In  (z')  q  (/3o  +  ffnZ')  du,      V'^^   =     f  i^  (z')  q  (/?o  +  Hnz')  dz' . 

J\z'\<M  J\z'\<M 

Define 

Fi^  {z)=   f  p{z-  z')  e^  {z')  q(Mdu,    r^^  =   f  e^  (z')  q(l3o)dz'. 

J\2'\<M  J\z'\<M 

Similaj  to  (19),  lemmjis  (7)  and  (8)  in  section  (C.5)  demonstrate  that  the  tail  of  the  likelihood 
ratio  process  is  exponentially  small:  for  each  e  >  0,5  >  0,  3M  e  R+: 

limsupP(|F„(z)-ri^(z)|>£)  <5,  (20) 

2.  In  view  of  (20),  it  suffices  to  show  (r'n2,T^i  (zk),k<K^  -^  (r2L,rfL  (zk)  ,k  <  k\  This 
follows  by  two  facts: 

i  Ee„{z)  =  1<  oo 

ii  E\eV^{z')  -  e'J\z)f  <  c\z  -  z'\ 

Fact  i.  is  be  definition  of  likelihood  ratio.  Fact  ii.  is  by  Lemma  7.  These  facts  check  the  conditions 
of  a  limit  theorem  for  integrals  of  random  functions,  Theorem  22  in  Appendix  I  of  IH  (1982).  ■ 

C.3     Proof  of  Theorem  6 

By  assumption  —l„  is  lower-semi-continuous.  Two  steps  are  needed: 

i.  Show  finite-dimensional  convergence  of  £„  to  the  stochastic  limit  iao, 
ii.  Show  the  stochastic  equi-lower-semi-continuity  (e-lsc)  of  {—in}, 

Theorem  4  handled  step  i.  We  demonstrate  step  ii  next.  After  having  these  two  steps  the 
conclusion  follows  by  Lemma  2,  since  Z„  =  Op(l)  by  the  tail  smallness  lemma  8  and  Theorem 
1.5.1  in  IH(1982). 

It  therefore  remains  to  show  the  stochastic  equi-lower-semi-continuity  (e-lsc)  of  {—in}  or 
equivalently  of  {—Q2n  =  —  logf„}.  From  the  proof  of  Theorem  4: 

Qin{z)^u'(i  +  Qi^iv),me°°(Z)  (21) 

where  Qioo()  is  the  gaussian  process  defined  in  the  proof  of  Theorem  4  and  Z  is  any  compact 
subset  of  R'^ .  Also 

Q2n{z)  -  Qiniz)  -^  0, 

uniformly  in  z  over  Z,  where 


Q2n{z)  =    /     Ln{j,x)dN(j,x), 

Je 


lun(j,x)  =  In  44l(0  <  ■?■  ^  A„(i,u))  -|-ln^l(0  >  j  >  A„(z,u)). 
p(x)  q(x) 

Because  of  (21),  it  suffices  to  show  s-e-lsc  of  {—Q2n{z)}  only.  Because  Q2n{z)  depends  on  z 
only  through  u,  we  may  write  Q2niu)  instead.  Because  Q2n(u)  is  a  piece- wise  constant  function 
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it  suffices  to  show  that  for  any  bounded  set  B  C  R'*'  and  S  >  0,  there  are  open  neighborhoods 
V{ui),...V{um)  of  some  til, ...,  Urn  s.t.  B  CU'^=iV{uk)  and 

ur=i{     inf      -02n(«)  < -Q2.K)}     <S- 

uev(uj,) 

This  is  done  in  several  steps.  The  proof  is  nearly  identical  to  the  proof  of  Theorem  3,  and  only 
the  minor  differences  are  highlighted  below. 

(a).  This  step  is  identical  to  step  (a)  in  the  proof  of  Theorem  3,  where  the  point  process  N  is 
constructed  and  its  limit  is  found.  This  process  eJIows  to  demonstrate  the  equisemi-continuity  of 
Q2rf  Write 

Q2„{V.)=     f  lun{j,x)<m{j,x)+    f  lunU,x)di<!(j,x)  =  QtAu)  +  Q2ju). 

JE:J>0  JE:J<0 

We  wish  to  examine  the  nature  of  discontinuities  of  Q2n  by  examining  that  of  Q^^  and  Q^„. 
Because  the  arguments  are  identical  for  either  part,  consider  only  part  Q2n- 
(b)  This  step  is  identical  to  step  (b)  in  the  proof  of  Theorem  3  except  that  the  construction 
of  {Vkj}.  Here  for  for  sufficiently  large  n,  Vkj  =  {u  G  R"^'  '■  V.k  ~  f  "^  'p{x)^n{x,v.)  <  Uj^  + 
(^,Vr  6  X^{xj)^,  where  c^  >  0  and  v,.  =  kip,  for  k  e  {— rn, ...,  0,  ...m}.  Observe  that  ^n(X,u)  = 
dg{x,Po+ut(X)/n)  j^^  ^  positive  definite  varicince  matrix  uniformly  in  u'  in  any  compact  set,  as 

n  — >  00,  by  assumption.  Thus  the  support  of  this  vector  in  non-degenerate  in  R"*' .  Thus,  as  in 
Theorem  3,  since  also  X  is  compact  and  p(x)  is  bounded  away  from  0  and  above  by  assumption, 
{UfcjVitj}  can  cover  any  given  bounded  set  B  by  selecting  sufficiently  large  m,  for  large  n. 
(c)[Number  of  Break-points  is  Op(l)  and  separated  for  small  tp\  Consider  argument  z  in 
KJjVkj,  then  a  discontinuity  in  0271 C'^)  (since  it  only  depends  on  u  through  the  index  A„(x,u)) 
can  potentially  occur  in  the  set  VijVkj  only  if  there  exist  u'  S  UjVfcj  and  {Tin,  Xin)  s.t. 

ri„=p(X.„)A„(Xi„,«*),  (22) 

where  it  should  be  the  case  for  large  n  that  i^^.  —  (^  <  Tin  <v_k  +  f-  If  there  is  such  (r,„,X,„), 
we  say  that  Qj„  has  a  breakpoint  in  \JjVkj-  Define  jV„  =  #{i  :  r,n  <  fc},  jV  =  #{i  :  F,  <  k}, 
where  k  =  sup^.^^  u6bP(^)^(^i")  +  '^'P-  For  sufficiently  large  n,  Mn  is  the  upper  bound  on  the 
number  of  breakpoints  of  Qini^)  ™  set  B.  By  continuous  mapping  theorem,  Nn  — >  J^  in  R.  So 
the  number  of  breakpoints  0{Mn)  is  stochastically  bounded  Op(l).  Furthermore,  breeik-points  are 
separated  in  the  same  sense  as  in  the  proof  of  Theorem  3.  Define  Ak  to  be  the  event  that  Q^ni^) 
has  more  than  two  break-points  in  \JjVkj-  Then  by  the  arguments  that  are  identical  to  those  in 
the  proof  of  Theorem  3,  for  any  S  >  Q  <p  can  be  picked  small  so  that  limsup„  P  [VkAk]  <  5/2. 
(c) [grid-points  by  setting  4>  small]  Next  step  is  to  construct  the  "centers"  of  Ukj  of  Vkj- 
Pick  Ukj  e  Vkj  so  that  for  large  n  v^  —  ip  <  A„(x,Ukj)  <  Vy.  —  ip  +  t],  Vr  e  X^(xj),  where 
■q  :  Q  <  7)  «  ip  will  be  set  sufficiently  small  in  the  next  step.  Depending  on  rj,  to  satisfy  the 
constraints  set  4>  sufficiently  small  as  well. 

(d)  [Stochastic  Equi-semicontinuity]  This  step  is  identical  to  step  (d)  in  the  proof  of  Theorem 
3,  except  we  replace  X' z  with  £\n(X,u).  ■ 

C.4     Proof  of  Theorems  7  and  8 

Proof  of  Theorem  7  The  result  is  well  known.  Under  the  stated  conditions,  the  posterior  risk 
is  finite  for  the  Bayes  estimator,  so  Theorem  1.1  in  Ch.4  of  Lehmann  applies.  ■ 
Proof  of  Theorem  8.  Z„  denotes  the  re-scEiled  Bayes  estimator  Hn^ilBayes  —  7) 


34 


As  a  preliminary  step  we  have  for  large  n,  for  c\,C2  >  0  and  any  H 

^■{|Z„|  >J/}  <ciexp{-C2|H|},  (23) 

which  follows  by  Theorem  1.5.3  in  Ibragimov  and  Hasminski  that  required  the  tails  smallness 
conditions  verified  in  Lemmas  (7)  and  (8). 

Zn  under  P-,  depends  on  7,  which  we  emphasize  by  writing  eis  Z^.  (23)  and  majorzation  of  p 
by  a  polynomial  imply  that  for  some  no, 

\p{ZZ),n  >  no,  7  in  open  ball  at  70  !•  is  uniformly  integrable.  (24) 

Define  Z^{K)  as  H~^  {fBayes,\K  "7)1  where  fsayes.XK  '^  ^^^  Bayes  estimator  defined  with 
respect  to  the  prior  weight  Ak(i)  =  l{Hn^{x  —  70)  €  K}.  By  construction  for  large  H: 

P'{\ZUK)\>H}  =  0.  (25) 

Thus  for  some  no  and  any  compact  sets  K, 

<p{ZZ{K)  —  S),n>  no,  7  in  open  ball  at  70  f  is  uniformly  integrable.  (26) 

Next  we  (a)  evaluate 

/  =  Rp{{f Bayes}, R'^)   and  II  {K)  =  RpUfsayesAK}  ,  K) 

and  (b)  show  that  II{K)  approaches  /  from  below  as  K  '\'R^ . 
Step  (a).  Ri,{{fBay..},K)  =  \\m^  JJ^Ep^^^,^p{Zl-''^^)d5|Leh{K) 

=  /  Ep^^p{Z^)d5lheh(K)  =  Ep^^piZ^) 
Jk 

by  Theorem  5,  equations  ((23)-(24)),  and  dominated  convergence  theorem.  Thus 

RpiifBay.s},^'')  =  Ep^^p{Z^) 

Analogously,  Rp{{fBaye.,Xj,  },K)  =  lim„  /^  Ep^^^,^p(Z;i''^'HK))d5/Leh{K) 

=  [  Ep^^p{zUK))d5/Leh{K) 

JK 

by  Theorem  5(applies  to  Z^"      (K)  by  the  same  argument  as  it  applies  to  Z^"     )  and  equations 
(25)-(26),  and  dominated  convergence,  where 


.^^(iir)  =  aig  inf    /    p{z  -  ft  -  S)lcx,{ft  -  S)dft. 


Step  (b).    Next,  for  any  S,  fj^  Ep^^p{Zi:,{K))d5/heb(K)  <  Ep^^p{Zoo).  This  follows  because 
the  Ihs  is  the  lower  risk  bound  for  Rp{{f„},K).  Thus  no  other  estimator  sequence  in  {F„}  that 
diifers  from  {f Bayes, Xk  }  ^'^^  'O-  "  achieves  lower  risk  value. 
Rewrite  the  inequality  as 


^  Ep^^  [piZ^K))  -  p{Zo.)]  dS/Leh(K)  = 

J  Ep^^  [piZtoiK))  -  p{Zoo)Y  dS/heb(K)  -  J  Ep^^  [p{Zi,{K))  -  piZoo)]'  d5/Leb{K)  <0 


(27) 
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Next  as  r{K)  — >  oo 

J   Ep^^  [piZtiK))  -  p{Z^(K))]  "  dS/heb{K)  =  J       ^  Ep^^  [p(ZI^^''\k))  -  p(Z^)\  '  dfi  ^  0. 

(28) 
where  r(K)  denotes  the  width  of  the  cube  K  (which  is  centered  at  zero  by  definition)  .  The  last 
conclusion  follows  by  (a)  domination:  for  any  n  €  (0,1)'^,  r{K)  <  oo: 

[p(Z^'<^'(/f))  -  p{Z^)]  '  <  p{Z^)  (29) 

so  that  jBp^ij  Ld(2^'^       {K))  —  p(Zao)\     <  Sp^^p(^tx))  and  (b)  pointwise  convergence:    as  r(/£') ->  oo, 
for  any  p  e  (0,1)'' 

EP,,  [p(Z^'*^'(X))  -  p(Zo.)]  "  ^  0,  (30) 

since  (i)  Z^  — >  Zoo  for  p  6  (0,1)  by  definition  and  convexity  Lemma  1,  due  to  fi-di 
convergence  of  the  objective  function  that  Z^  minimizes  to  the  objective  function  that  Z^o 
minimizes,  in  probability;  (ii)  by  (29)  the  collection  of  variables 

I  \p{Z'^^'^\K))  -  p(Zoc)]~  ,    s.t.  p  e  (0,  l)'^,r(K)  <  ooj  is  uniformly  integrable. 

(27)-  (30)  also  imply  that  it  must  be  that  Jj^Ep^^  [p{Zi^{K))  -  p{Z^)]^  d5lheh{K)  ->  0  as 
r{K)  -^  oo.  Thus  II{K)  t  /  as  /sT  t  R''- 

Note  that  this  shows  that  {fsa-yca}  minimizes  iip({/},R  ).  Suppose  that  there  is  a  sequence 
{/^}  in  {F„}  that  achieves  lower  risk  value  than  Ep^^p{Zoa)-  But  then  this  sequence  must  achieve 
a  lower  value  than  II{K)  for  some  large  AT,  which  is  impossible  by  the  previous  comment.  ■ 

C.5     Lemmas  7-8 

In  this  section  some  important  properties  of 

^n(z)=in(7(^))/i47)- 

aie  established,  where  7(2)  s  7  +  H^z  for  z  =  (u,v)  €  R''. 

First  we  note  the  conditions  of  lemma  4  that  uniformly  in  7  in  a  ball  at  70 

'EpJ^(zy'^<e-'^^\         Ep^\in{zf'^-ltn{z'y'^\<B\z-z'\ 

By  the  proof  of  lemma  1.5.2  of  IH(1982)  the  first  inequality  only  needs  to  hold  for  z  large,  and 

the  second  inequality  only  needs  to  hold  for  \z  —  z'\  <  1. 

For  the  nonlinear  model  R,  the  conditional  Bellinger  distance  r2  (7;  7  +  h)^  is  defined  as 

\f'^^  (y  -  g(x,t  +  hi)-x,'y  +  h)  -  f^^  (y  -  g  (x,  t) ;  1,7)  I'rfyFx  {dx) 
Next  we  obtain  the  nonlinear  versions  of  lemmas  (5)  and  (6)  for  the  model  R. 


//I 


Lemma  7  If  3a  >  0,  A  >  0,  such  that  for  h  >  0  small  enough  for  7  in  a  ball  at  70 

m{rl(r,'r  +  h)>amax{\hi\,\h2f)      and     suprj  (r,!  +  h)  <  A  {\hi\  +  \h2\'^) 

then  3b>  0,B  >0,  s.t.  for  all  u  large,  V2  e  R'',  z'  e  R^ ,  \z  -  z'\  <  1.  3no,  s.t.  Vn  >  no, 


sup 

7 


Epjn  (zY''  <  e-'l^l,  SUpEpJir.  (Z)"'  -  in  {z)"^  \  <  B\z  -  z'\ 
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Proof:  For  z  large,  the  same  caJculation  as  in  lemma  5  leads  to 


EpJ„{z) 


1  -  ^''s  (70; /9o  +  u/n,ao  +  v/y/n) 


On  the  other  hand,  for  \z  —  z'\  small, 

Ep^  |£„  {zY'^  -  In  [zY'''  I  <nrl  (70  +  [uln,  v/^)  ;  70  +  (u /n,  v'l^))  <  A[\u  -  u'\ +  \v  -  v'\^) 
<A{\z-z'\  +  \z-z'\'')<B{\z-z\)     ■ 

Lemma  8   In  model  R,  suppose  condition  (E.1)-(E.6)  hold,  then  3a  >  0,.  3A  >  0,  s.t.  V/i  >  0 
small  enough;  for  7  in  a  ball  at  70 

inf  r2  (717  + /j)  >  amax  (1^1 1,  |/i2|^)      and     suprj  (717  + /»)  <  A  (|/ii|  +  |/!2p) 

''  7 

Proof  of  Lemma  8  For  e  >  0  and  77  >  0  small  enough,  let  7  =  {t,s),h  =  (e,  rf), 

rlir,  'y  +  h)=Ex  J  [f'"  {y-g(X,t  +  e)  \X;  t  +  e,s  +  rj)-  f"  (y  -  g  (X,  t)  \X;  t,  s))  '  dy 

f 


<Ex   I  \f(y-g{X,t  +  e)\X;t  +  e,s  +  Ti)-f{y-g{X,t)\X-t,s) 

9(X,f),s(X,t  +  0]  ' 


+  Ex  [  (f'^'  {y  -  g(X,t  +  €)\X;^  +  h)  -  f'  (y  -  g{X,t  +  e)\X;t  +  e,s)Y  dy 

^[g(X,i),s(X,t  +  e)]'=    "^  ^ 


+  Ex 


J 


f  {y  -  g{X,t  +  e)\X;t  +  e,s)  -  f  (y  -  g{X,t)\X-t,s) 


dy 


(2) 


<2Ex\giX,t  +  e)-g{X,t)\{p{X,'y)  +  g{X,'y)) 

\df'^'(y-g(X,t  +  e)\X,t  +  e,s  +  wv) 


1^1    r  ^^    r\df{y-g{X,t  +  uje)\X,t+we,s) 


dydtj 


(3) 
< 


2\e\Ex{p{X,'y)  +  q{X,'y))   f 
Jo 

0{\e\)+0{\v\'). 


dt 
'  \dg{X,t  +  se) 


dyduj 


dt 


dsFx(dX)  +  0{\Tif)+0(\e\) 


where  [a,  b]  =  [a,  6]  if  a  <  6  and  =  [6,  a]  if  6  <  a,  and  the  bound  is  uniform  in  7  by  (E.4).  The  first 
inequality  follows  by  tricingle  inequality  and  from  |a  —  6p  <  \a^  —  b^  \  for  a  >  0  and  b  >  0.  The  first 
term  in  the  second  inequality  follows  from  that  fact  that  for  y  &  [g  {X,  t) ,  g  (X,  t  +  e)]  and  e  small 
enough,  |/  (1)  |  <  2{p  (X, 7)  +  g  {X,  7)),  and  then  we  integrate  over  y  over  that  area.  The  second 
and  third  terms  in  the  second  inequality  are  usual  multivariate  first  order  Taylor  expansions  and 
Fubini.  The  first  term  in  the  third  inequality  follows  from  Taylor  expansion  and  Fubini.  The 
second  term  in  the  third  inequality  follows  from  assumption  (E.4),  while  the  third  term  in  that 
inequality  follows  from  assumption  (E.2). 
To  obtain  a  bound  from  below,  consider  rf  (7;  7  +  /i): 

=Ex  [  (f''{y-g{X,t  +  e)\X;'r  +  h)-f'^'{y-g{X,t)\X;y)ydy 

y[s(X,(),s(X,(+c)]   ^  ^ 

+  Ex  [  (f'^y  -  g(X,t  +  e)\X-n  ^  h)  -  f'^  {y  -  g{X,t)\X-y))'  dy 

J\s(X,t),giX,t+()Y    ^  ' 
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We  can  bound  the  first  term  from  below  uniformly  in  7  by 


E,\ 


g{X,t  +  e)-giX,t) 


p'l'(X,^)-q"\Xn) 


>c\e\Ex 


dg{X-t) 


dt 


>cU\ 


using  assumption  (E.3),  Taylor  expansion  ajid  Cauchy-Schwartz  inequality.   On  the  other  hand, 
bound  the  second  term  of  r2  (7;  7  +  h)  uniformly  below  by: 

Ex  [  {f"(y-g{X,t  +  e)\X-^  +  h)-f'^(y-g{X,t)\X-t,s))''dy 

J\g(X,t),9(X,t+€)Y    ^ 


2 


y|s(x,<),3(x,(+£)]'  V  "7  / 

Under  assumption  (E.4)  (a),  a  lower  bound  is  \h\^  inf|„|=i  Ex  f,  ,x  t)  gix  (+£)]■=  (  ~ — ^""37^^'^'"^^  ■"  ) 

=  \hf     M^  Ex  I  ^/^^'fa-^ff^(^.*);7)'^  \y^^  ^|^|2)    >  ^|^|2    >  ^|^|2 

On  the  other  hand,  if  assumption  (E.4)  (b)  holds,  the  uniform  lower  bound  is 
Thus,  conclude  that  inf-,  r|  (7;7  +  h)  >  cmax  (|e|,  |J7^|).  ■ 
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